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Samuel  B.  Lyerly 


Although  this  paper  is  the  first  on  the  program,  it  is  in  no 
sense  a  "key-note  address"  or  an  introduction  to  the  presentations  that 
will  follow.  It  is  mainly  an  attempt  to  propose  some  questions  that  I 
and  several  others  have  been  concerned  about  and  for  which  we  hope  to  get 
from  this  meeting  some  useful  insights,  if  not  definitive  answers-- if  not 
from  the  papers,  perhaps  from  the  informal  discussions  for  which  we  have 
budgeted  a  liberal  proportion  of  time. 

V/e  all  know  that  many  problems  in  cluster  analysis  are  common  to 
various  fields,  including  the  several  represented  here;  and  if  I  lapse 
into  the  language  of  psychology  from  time  to  time  I  am  sure  that  you  will 
have  no  trouble  making  appropriate  translations.  And  if  some  of  my  remarks 
seem  critical  of  certain  work  that  has  been  done,  I  am  sure  that  you  will 
understand  that  I  am  referring  to  others  who  are  not  present  in  this  room. 

I  think  it  is  well  for  us  to  remind  ourselves  that  even  in  this 
enlightened  decade  "typological"  concepts  are  controversial  with  many  of 
our  colleagues  in  the  behavioral  sciences.  Back  in  my  undergraduate 
days  in  psychology  the  prevailing  doctrine  was  that  individual  differences 
are  essentially  quantitative  rather  than  qualitative  (and,  if  you  used  an 
appropriate  measuring  instrument  and  followed  the  instructions  in  the 
manual,  they  should  all  be  "normally  distributed").  Even  unmistakedly 
aberrant  behavior,  when  it  could  not  be  linked  to  some  physical  injury 
or  disease  or  to  a  genetic  origin,  was  likely  to  be  regarded  as  an  extreme 
manifestation  of  some  "normal"  dimension  of  behavior.  In  recent  years, 
however,  there  seems  to  have  developed  a  growing  suspicion  that  there  may 
be  ways  of  assigning  people  to  groups  or  types  or  diagnostic  categories 
in  such  a  way  that  knowing  a  person's  classification  will  significantly 
aid  professional  workers  in  helping  him  in  medical,  vocational,  educational, 
or  other  situations.  I  am  sure  that  if  we  did  not  share  this  point  of 
view,  or  were  not  members  of  this  "type,"  we  would  not  be  here  today. 

The  first  big  question,  and  one  which  I  am  sure  will  receive  a 
certain  amount  of  attention  during  these  several  days,  is;  "V/hat  is  a 
cluster?"  For  human  populations  I  have  seen  no  definition  that  can  be 
unequivocably  translated  into  operational  procedures  and  few  if  any  which 
seem  to  have  satisfied  even  those  investigators  who  have  proposed  and 
used  them.  A  typical  statement  is  that  a  cluster  (type,  group,  species) 
is  composed  of  individuals  (objects,  specimens,  activities)  such  that  every 
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member  of  the  cluster  is  in  some  relevant  sense  "closer  to"  other  members 
of  his  group  than  he  is  to  members  of  other  groups.  As  a  definition,  a 
statement  such  as  this  is  of  course  circular  and  permits  of  various  inter¬ 
pretations,  depending  upon  the  investigator's  purpose,  the  nature  of  the 
data  he  happens  to  havti  at  hand,  and  the  computer  program  that  he  has  been 
able  to  borrow.  In  a  t/pical  study  in  the  social  sciences  one  does  not 
know  at  the  outset  whether  any  types  or  clusters  exist  (by  whatever  defi¬ 
nition);  how  many  clusters  to  expect,  if  any;  what  proportion  of  the  sample 
can  be  comfortably  assigned  to  one  or  another  of  the  clusters  that  may  be 
discovered;  or  what  kind  of  statistical  conclusion  or  probability  statement 
can  be  made  to  reflect  one's  degree  of  confidence  in  the  findings.  There 
is  often  no  preliminary  statement  of  a  clear  model,  either  substantive  or 
structural,  that  the  investigator  is  seeking  to  confirm.  In  many  psy¬ 
chological  studies  seeking  clusters  we  may  get  useful  hints  about  an 
implied  theoretical  model  or  about  certain  likely  hypotheses  by  studying 
the  list  of  variables  the  investigator  has  chosen  to  analyse.  But  this 
is  not  always  a  clear  guide,  since  variables  seem  to  be  chosen  frequently 
because  of  availability  or  for  even  more  obscure  reasons.  So  I  hope  to 
leave  this  conference  feeling  a  little  more  secure  about  the  cluster 
concept— what  a  cluster  is,  how  to  recognize  one  when  I  see  one,  what 
advances  are  being  made  toward  operational,  objective  methods  of  cluster 
identification. 

My  second  area  of  concern  has  to  do  with  the  choice  of  variables 
to  be  used  in  a  cluster  analysis.  As  I  implied  a  moment  ago,  ideally 
the  variables  should  be  specified  by  the  investigator's  initial  hypothesis 
or  model,  but  in  the  typical  "exploratory"  study  this  is  not  always  the 
case.  Sometimes  there  does  not  appear  to  have  been  a  clear  understanding 
of  the  nature  and  characteristics  of  some  of  the  variables  employed. 
Occasionally  sets  of  variables  from  quite  dissimilar  domains  have  been 
brought  together  in  attempts  to  seek  clusters  within  a  common  set  of 
dimensions.  Some  research  programs  have  taken  what  seems  to  me  to  be  a 
sensible  course  (at  least  in  those  areas  of  psychology  in  which  such  a 
course  is  applicable):  Variables  are  selected  which  are  relevant  in  terms 
of  the  investigator's  theory  and  whose  characteristics  or  "meanings"  are 
well  understood  from  previous  work  (e.g.,  validity  studies,  factor  analysis, 
or  the  like).  I  understand  that  in  some  fields,  such  as  biological  taxonomy, 
there  are  some  fairly  explicit  models  and  that  the  selection  of  variables 
to  be  used  in  classification  can  thereby  be  more  rationally  determined. 

I  hope  that  Professor  Sokal  will  enlighten  us  on  this. 

Related  to  the  selection  of  variables  is  the  problem  of  their 
distributional  form,  and  the  associated  problem  of  the  metric  properties 
of  the  data.  Some  investigators  insist  or  prefer  that  only  normal  (or 
normalized)  variables  be  used.  Others  do  not  hesitate  to  use  nonnorma) 
data,  dichotomies,  or  orthogonal  "dummy"  components  of  mult ichotomous 
data.  There  is  more  than  a  matter  of  taste  involved  here.  Are  certain 
relevant  data  in  the  domain  "inherently"  nonnormal  or  qualitative?  V/hat 
are  the  scalar  properties  of  a  given  variable?  Considering  the  selective 
and/or  haphazard  conditions  under  which  many  of  our  human  samples  are 
drawn  (in  schools,  hospitals,  etc.)  and  the  adventitious  origins  of  many 
of  the  observations  behavioral  scientists  use,  how  can  we  reasonably 


1-03 


expect  or  demand  any  particular  distribution  forms?  One  great  advantage 
of  normality  (in  particular,  mult i var iate  normality,  which  isn't  easy  to 
come  by)  is  that  it  facilitates  various  statistical  manipulations  and 
permits  certain  significance  tests.  But  cluster  analysis  is  a  long  way 
from  becoming  a  statistical  method  and  in  the  meantime  there  are  probably 
some  more  pressing  problems  deserving  priority  than  the  matter  of  whether 
distributions  conform  to  the  normal  or  any  other  standard  shape. 

One  of  the  more  critical  problems  in  cluster  analysis  and  related 
techniques  is  the  choice  of  an  interperson  index,  since  the  process  usually 
starts  with  a  table  or  matrix  of  n  x  n  numbers,  each  representing  com¬ 
parisons  of  each  individual  in  a  sample  of  £  with  every  other  individual. 
These  indices,  as  you  know,  are  typically  one  of  two  kinds:  measures  of 
similarity  (correlations,  covariances,  cross-products,  "per  cent  agreement") 
or  measures  of  diss imi lar ity  ("Euclidian"  or  some  other  index  of  "distance"). 
You  are  all  familiar  with  these  indices  and  their  major  characteristics. 

The  point  I  want  to  make  is  that  some  investigators  seem  to  have  made  their 
choice  of  index  on  the  grounds  of  convenience  or  familiarity  without  recog¬ 
nizing  that  different  indices  can  give  rise  to  quite  different  cluster 
conf iqurat ions .  It  is  not  necessary  at  this  time  or  in  this  company  to 
elaborate  or  document  this  statement.  I  shall  be  interested,  however,  to 
learn  from  some  of  our  participants  their  reasons  for  choosing  the  indices 
they  have  used  and  their  experiences  and  recommendations. 

Incidentally,  in  a  hasty  and  incomplete  survey  of  the  social 
science  literature  covering  the  past  five  or  six  years,  I  have  found  that 
the  distance  type  of  index  is  now  leading  the  correlational  type  by  about 
two  to  one.  I  think  there  may  be  several  reasons  for  this:  (I)  Distcince 
measures  have  received  more  respectful  attention  from  statisticians,  who 
have  as  you  know  developed  some  elaborate  distance-based  models  for  use 
in  the  closely  related  classification-decision  problems.  (2)  Correlational 
indices  ("Q"  measures)  have  certain  metric  problems  and  seem  to  suffer  from 
particular  ambiguities  from  the  sampling-significance  point  of  view. 

(3)  The  use  of  the  correlation  coefficient  involves  the  controversial 
"level"  concept,  which  has  not  always  been  squarely  faced.  (My  own  feeling 
is  that  "level,"  which  is  an  average,  can  be  removed  or  ignored  only  when 
it  is  demonstrably  irrelevant  to  the  investigator's  purpose  and  when  it 
has  a  clear  meaning  in  its  own  right,  e.g.,  the  mean  or  total  score  derived 
from  a  battery  such  as  the  './echsler  subtests.  It  follows,  then,  that  the 
variables  must  be  from  the  same  domain,  must  all  "point  in  the  same  direc¬ 
tion"  so  far  as  their  general  behavioral  significance  is  concerned,  and 
hence  be  positively  correlated.) 

I  shall  pass  over  seve  a  I  related  technical  matters  such  as  the 
appropriate  dimensionality  of  ane's  space;  whether  the  dimensions  should 
be  orthogonal  or  correlated;  the  questions  of  standardizing,  weighting, 
etc.,  with  the  suggestion  that  perhaps  we  are  not  yet  ready  for  decisions 
on  some  of  them.  Perhaps  we  need  more  experience  with  various  empirical 
approaches  which  aspire  no  higher  than  the  descriptive  and  the  topological. 

The  area  which  has  received  the  most  attention  recently,  with  the 
increasing  availability  of  electronic  computers,  concerns  the  efficient 
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manipulation  of  data  according  to  some  routine  or  program  designed  to  locate 
clusters  if  they  exist  and  to  assign  each  assignable  member  of  the  sample 
to  his  appropriate  group. 

With  a  matrix  of  interpersonal  similarity  or  dissimilarity  measures, 
there  are  two  general  methods  of  attack  that  have  been  used.  The  older, 
and  still  the  most  frequent,  involves  the  locating  of  pairs  of  individuals 
who  are  "closest"  to  form  the  nuclei  for  types  or  clusters,  then  examining 
other  individuals  or  pairs  to  be  added  to  existing  groups  or  to  form 
tentative  new  groups.  Various  sequences  or  "rules"  have  been  adopted  and 
various  criteria  for  inclusion,  exclusion,  or  reassignment — some  planned 
objectively  (and  hence  adaptable  to  computer  methods)  and  others  dependent 
upon  the  investigator's  judgment  at  various  points  in  the  process.  The 
rules  are  essentially  arbitrary  and  there  are  usually  a  number  of  individuals 
left  unassigned  to  any  group.  This  may  be  called  the  "synthetic"  approach 
to  the  clustering  process.  (A  British  writer  has  recently  called  it  the 
"agglomerat ive"  method.) 

The  other  major  approach,  which  has  been  attempted  more  frequently 
in  recent  years,  is  what  might  be  called  the  "analytic"  method  (or 
"divisive,"  in  the  term  of  our  British  colleague).  Instead  of  beginning 
with  n  individuals,  each  a  "cluster  of  order  one,"  and  successively  com¬ 
bining  pairs  and  larger  groups  until  all  or  most  have  been  assigned  accord¬ 
ing  to  some  rule  of  "belongingness,"  the  investigator  begins  with  the 
entire  sample  as  one  cluster  and  asks  "How  can  1  divide  these  into  two 
groups,  each  of  which  is  more  homogeneous  with  respect  to  some  criterion 
or  standard  than  is  the  total  sample  and  more  homogeneous  on  the  average 
than  would  be  the  case  if  any  other  partitioning  into  two  groups  were 
made?"  The  criterion  may  be  something  like  minimizing  within-groups  sums 
of  squares  or  maximizing  between-groups  differences. 

Next,  having  divided  the  original  sample  into  two  groups  according 
to  the  criterion  (which,  if  carried  out  completely,  involves  examining 
each  of  the  possible  (2^”'  -  I)  partitions),  the  investigator  may  analyze 
each  of  them  and  search  for  ways  to  divide  them  into  further  subgroups. 

This  sequence  of  steps  may  be  continued  and  the  results  tested  at  each 
stage  (although  an  "exact"  test  of  an  appropriate  null  hypothesis  for 
such  a  procedure  is  not  known). 

The  result  of  this  series  of  operations  will  ordinarily  be  an 
hierarchical  "tree"  configuration  of  groups,  consisting  of  a  "trunk" 

(the  original  undifferentiated  sample),  one  or  more  orders  of  "limbs"  and 
"branches,"  and  finally  the  "twigs"  (the  ultimate  smallest  groups  which 
cannot  be  further  subdivided).  The  configuration  need  not  be  symmetric. 

Some  ultimate  categories  may  be  at  the  limb  or  branch  level. 

Two  characteristics  of  the  "analytic"  approach  in  comparison  with 
the  "synthetic"  are:  (1)  It  is  more  "objective"  and  hence  more  readily 
programmed  for  computers  (at  least  in  the  forms  in  which  recent  investi¬ 
gators  have  used  these  methods,  though  not  necessarily  in  general);  and 
(2)  it  assures  that  every  individual  is  assigned  to  one  of  the  ultimate 
groups,  provided  some  quas i -stat i st ica 1  criterion  is  used  to  terminate  the 
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process  (such  as  a  predetermined  wi th in-groups  sum  of  squares  or  a  minimum 
number  of  cases  in  the  ultimate  categories). 


Obviously,  if  either  the  synthetic  or  the  analytic  procedure  is 
allowed  to  proceed  unchecked  by  any  rule  of  "when  to  stop,"  the  Sorcerer's 
Apprentice  will  take  charge.  The  synthetic  approach  will  ultimately 
assign  everyone  to  a  single  type,  and  the  analytic  will  finally  split  the 
entire  group  into  n  classes,  each  containing  one  person. 


Most  of  the  attempts  at  empirical  clustering  have  been  step-wise 
and/or  iterative  procedures.  A  solution  which  has  some  obvious  appeal  is 
that  the  investigator  form  every  possible  arrangement  and  test  each  such 
arrangement  against  the  criterion  he  has  chosen.  In  other  words,  he  would 
divide  his  subjects  into  every  possible  set  of  2  groups,  3  groups,  etc., 
and  test  every  such  set  of  partitions.  This  could  be  considered  a  frontal 
attack,  avoiding  some  of  the  theoretical  objections  to  the  synthetic  or 
analytic  approaches.  The  difficulty  with  this  idea  is  that  with  samples 
of  even  moderate  size  the  problem  is  beyond  the  ability  of  even  the  fastest 
and  most  capacious  modern  computer  to  handle.  The  number  of  ways  of  clas¬ 
sifying  n  individuals  into  r_  groups  is  n.'/M  times  the  coefficient  of  xn 
in  the  expansion  of  the  generating  function  (ex  -  l)r.  For  a  sample  of 
16,  which  is  certainly  as  small  as  most  investigators  would  want  to  use, 
the  total  number  of  arrangements  is  more  than  10  billion!  Hence  the  need 
for  short-cuts,  approximations,  and  iterative  approaches  to  the  clustering 
problem. 


In  order  to  have  more  time  for  discussions,  which  we  all  hope  will 
be  a  very  fruitful  part  of  this  conference,  I  shall  not  continue  along 
these  lines  at  this  time.  My  concluding  summary  remarks  (and  I  have  written 
some  down)  will  be  postponed  until  the  end  of  the  conference  if  anyone  want^ 
to  hear  them.  I'll  be  very  much  interested  in  whether  and  to  what  extent 
I'll  want  to  change  them  by  that  time. 


Methods  of  Cluster  or  Typological  Analysis 
Maurice  Lorr 
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Catholic  Univeraity 

The  purpose  of  this  report  is  to  review  and  examine  available  methoda  of 
typological  or  cluster  analysis.  To  statisticians  these  techniques  deal  with 
what  is  known  as  the  mixture  problem.  The  mixture  problem  is  concerned  with  a 
sample  regarded  as  composed  of  individuals  from  several  different  populatlona. 

Neither  the  number  of  populatlona  nor  their  nature  are  known  a  priori.  It  is 
also  not  known  which  Individuals  come  from  which  populations.  A  general  solution 
requires  estimating  both  the  number  of  populations  present  as  well  as  the  para¬ 
meters  of  the  different  populations.  Since  the  problem  ia  very  difficult,  exceed¬ 
ingly  little  work  of  a  probabilistic  nature  haa  been  done. 

Through  usage,  cluster  analysis  has  come  to  refer  to  procedures  applied  for 
two  different  purposes.  One  reference  is  to  procedures  for  identifying  types, 
that  is  to  say,  homogeneous,  mutually  exclusive  subsets  of  individuals,  cases, 
objects  or  sampling  units  within  a  matrix  of  data.  This  process  may  be  called 
typological  analysis.  In  its  second  meaning,  cluster  analysis  refers  to  procedures 
for  grouping  attributes,  traits  or  characteristics.  Here  two  different  objectives 
may  be  distinguished.  In  one  case  the  aim  ia  data  reduction  or  parsimony;  a 
smaller  set  of  measures  are  used  to  represent  the  larger  set  with  a  minimal  loss 
of  information.  The  second  aim  is  to  have  each  subset  reflect  some  hypothetical 
dimension.  Hie  process  may  thus  be  called  dimensional  analysis.  The  concern  here 
is  with  procedures  for  determining  types  not  known  a  priori. 

The  Utility  of  Typologies 

What  are  some  of  the  practical  and  scientific  uses  of  typologies?  It  is 
obvious  that  a  type  facilitates  communication.  The  unique  pattern  of  type  charac¬ 
teristics  make  members  of  a  type  easily  recognised,  remembered ,  understood  and 
differentiated  from  non-membera  in  a  given  domain.  To  label  a  person  a  psychopath 
or  a  schlsold  immediately  suggests  a  broad  pattern  of  traits  and  to-be-expected 
behavior.  A  second  related  advantage  is  that  type  membership  may  provide  enhanced 
predictions  to  outside  criteria  particularly  if  relations  among  variates  are  strongly 
nonlinear.  A  sample  of  persons  of  identical  or  homogeneous  profile  will  tend  to 
be  more  homogeneous  as  to  criterion-relevant  behavior  than  the  mixed  population 
(Toopa,  1948).  The  Integrity  of  the  individual  is  preserved  in  the  type  concept 
since  the  entire  score  profile  is  considered  Simultaneously.  Usually  his  scores 
are  considered  singly  and  in  isolation.  The  Improvement  in  predictive  accuracy 
takes  place  through  the  operation  of  higher  order  dependencies  and  through  the 
utilization  of  any  interactions  should  they  exist.  In  linear  regression  equations 
the  predicted  Y  scores  are  simple  weighted  additive  sums  of  the  predictor  scores 
in  which  the  weights  are  constants.  Interactive  effects,  like  the  simultaneous 
presence  of  say,  two  high  scores  and  two  low  scores,  sre  ignored.  The  possibilities 
of  such  configural  relations  have  been  shown  by  Mechl  (19t0),  Horst  (1956),  and  by 
Lubin  and  Osburn  (1957).  For  example,  two  dichotomous  items  may  be  totally  unre¬ 
lated  to  a  dichotomous  criterion  (such  as  schizophrenic  vs  normal)  when  scored 
singly.  Yet,  when  scored  for  their  joint  presence  or  absence,  these  two  items 
may  provide  near  perfect  prediction  to  the  criterion. 


A  taxonomy  of  natural  occurring  types  represents  an  important  achievement  in 
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it*  own  right.  If  there  ere  discrete,  qualitatively  distinct  subtypes  present  and 
demonstrable,  then  thia  knowledge  reflects  and  increased  understanding  of  the  domain. 
The  taxonomy  may  have  much  systematic  import  and  generality.  It  may  facilitate 
the  discovery  of  laws  not  observable  within  mixed  samples.  The  subgroups  may 
provida  or;  suggest  information  relative  to  coninon  structure,  common  processes,  and 
common  antecedents  much  aa  they  do  in  biology. 

In  opposition  to  the  general  purpose  typing  approach  just  described,  the  propo¬ 
nents  of  the  single  purpose  approach  argue  that  there  is  no  single  meaningful  way 
to  classify  people.  It  all  depends  on  one's  purpose.  Persons  similar  in  one  set  of 
variables  are  not  necessarily  more  similar  than  persons  in  general  on  another  set 
of  variables.  A  particular  claasif icatlon  is  meaningful  only  in  so  far  as  it  is 
related  to  a  broader  class  of  variables  one  desires  to  predict  or  control.  In 
this  approach  some  mathematical  function  of  the  profile  elements  is  found  or  con¬ 
structed  which  will  k<;st  predict  the  external  criterion.  Emphasis  ia  on  the  criter¬ 
ion  relevancy  of  the  type  characteristics  and  not  on  the  nature  of  the  profile. 
Finally  it  is  argued  that  multiple  linear  or  curvilinear  regression  la  more  efficient 
than  prediction  from  knowledge  of  type  membership. 

It  is  true  that  there  are  numerous  ways  of  classifying  people  in  a  given 
domain  depending  upon  one's  aims.  However,  the  presumption  in  the  mixture  problem 
is  that  tvo  or  more  natural  subgroups  exist.  If  they  exist,  they  are  likely  to 
have  arisen  or  developed  because  of  survival  value,  or  because  of  a  conjunction 
of  natural  laws.  In  contrast  the  classification  schemes  and  configural  scores  tied 
to  external  criteria  represent  technological  advances  lacking  scientific  generality. 
Each  new  decision  and  each  particular  situation  calls  for  another  empirical  search 
for  a  criterion-relevant  pattern.  While  useful  for  a  while  these  cook-book  patterns 
are  soon  outdated  as  new  criteria  or  potential  predictors  appear.  The  argument 
against  special-purpose  types  is  comparable  to  that  offered  in  support  of  the 
development  of  psychological  tests  as  instruments  of  psychological  theory  (Loevinger, 
1957;  Cattell,  1946).  Just  as  criterion-oriented  psychometrics  and  particularized 
validation  are  devoid  of  scientific  interest,  so  are  single-purpose  classification 
schemes. 

Structural  Models 

Before  examing  specific  procedures  for  finding  subsets  of  entitles  the  problem 
of  structural  models  requires  consideration.  The  overall  problem  is  one  of  devel¬ 
oping  a  fruitful  means  for  representing  the  data.  Cluster-search  procedures  should 
determine  rather  than  impose  structure  on  a  body  of  data.  If,  for  example  points 
are  uniformly  distributed  in  space  no  clusters  should  be  found.  Indeed  empirical 
data  suggest  that  clusters  may  vary  greatly  in  shape.  In  three-dimensional  space, 
they  may  be  spheroid,  serpentine,  amoeboid  or  cloud-like.  Thus  it  should  be 
evident  that  quite  different  cluster-aearch  methods  are  needed  to  ascertain  different 
structurea  and  different  objectives.  There  should  be  no  arbitrary  partitioning  or 
chopping  up  of  space. 


Cluster-Search  Techniques 

The  cluster-search  procedures  may  be  classified  for  purposes  of  description 
and  discusaion  into  the  following  categories:  (a)  factor  analysis,  (b)  multi¬ 
dimensional  scaling  (c)  minimizing  within-cluster  variation,  (d)  successive  cluster 
build-up,  (e)  linkage  analysis  and  (f)  hierarchical  analysis. 
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A.  The  Method  of  Factor  Analysis 

Factor  analysis  of  the  N  by  N  matrix  of  interperaon  aimilarities  followed  by 
a  rotation  to  simple  structure  has  been  a  common  procedure  for  identifying  types. 
Stephenson  (1936),  Tryon  (1955),  Boss  (1957),  Broverman  (1961),  Nunnally  (1962), 
Overall  (1964)  and  many  others  have  recommended  this  procedure.  The  Indices  of 
resenfe  lance  may  be  correlations,  normalised  crossproducta  of  scores,  squared 
distances,  or  simply  crossproducts  of  raw  scores. 

Factor  analysis  is  deemed  inappropriate  because  the  method  is  designed  to 
isolate  dimensions  and  not  clusters  of  entities.  There  is  no  reason  why  clusters 
defined  by  two  or  more  dimensions  may  not  be  more  numerous  than  dimensions.  The 
rotational  process  also  is  inappropriate  for  the  task  of  isolating  mutually  exclu¬ 
sive  subgroups.  The  usual  rotatloi'al  process  tends  to  dismember  clusters  or  to 
miss  them  altogether.  If  a  cluster  should  happen  to  fall  between  two  factors, 
each  type-factor  will  be  defined  by  persons  on  the  margins  of  the  cluster. 

Also  factoring  tends  to  yield  a  multiple  classification  of  persons  since  moat 
persons  will  correlate  significantly  with  several  type-factora,  and  relatively 
few  with  one  factor. 

When  correlations,  covariances  and  normalized  crossproducta  are  factored, 
all  unrotated  factors  are  bipolar  and  such  bipolarity  cannot  be  completely  removed 
(Ross,  1963).  Thus  persons  with  opposite  score  profiles  will  emerge  with  high 
but  opposite  loadings  on  the  same  factor.  Each  type-factor  is,  therefore, 
defining  two  types  rather  than  one.  Thus,  the  number  of  type-factors  defined 
cannot  be  the  same  as  the  number  of  types. 

The  most  cogent  general  argument  advanced  against  the  use  of  fsctor  analysis 
of  slmilsrity  indices  between  persons  is  thst  it  does  not  yield  new  information. 

The  number  of  factors  resulting  from  a  direct  R-analysis  of  measures  and  an 
obverse  Q-ena lysis  of  persons  will  be  the  same  (Burt,  1937;  Harris,  1955;  Sister, 
1958;  Ross,  1963;  Ryder,  1964).  If  variables  have  been  standardized  over  subjects, 
a  principal  component  analysis  of  sums  of  score  profile  crossproducts  yields 
exactly  the  same  results  as  an  analysis  of  correlations  among  variables. 

Lszarfeld's  latent  class  model  (1950)  as  further  extended  by  Gibson .(1959) 
also  calls  for  a  factor  analysis.  The  technique  operates  on  the  Interrelations 
of  dichotomous  attributes.  Manifest  joint  frequencies  are  accounted  for  by  a  set 
of  Q  mutually  exclusive  and  exhaustive  subgroups  (latent  classes).  The  model 
assumes  that  each  subgroup  or  latent  class  is  homogeneous  in  whatever  underlying 
dimensions  are  necessary  to  account  for  the  observed  Interrelations.  Stated 
otherwise,  there  is  within-class  independence  between  pairs  of  tests.  The 
number  of  latent  classes  is  determined  by  a  factor  analysis  of  the  lower  order 
joint  occurence  matrix. 

One  question  that  pan  be  raised  is  how  conflgural  information  from  higher 
order  joint  occurrences  can  affect  this  solution.  Lunnenborg  (1959)  has  argued 
that  the  Independence  of  items  effectively  precludes  the  possibility  of  configursl 
information  unleas  the  latter  la  present  in  the  sets  of  items  prior  to  the 
determination  of  latent  classes.  Another  limitation  to  the  method  is  that  it  appears 
to  be  confined  to  variates  of  relatively  small  dlmenslonality--usually  one  or  two. 
Most  typing  problems  in  psychology  involve  at  least  six  or  more  dimensions. 
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Although  there  is  nothing  in  the  development  of  the  model  equations  that  restricts 
the  number  of  dimensions,  empirical  examples  involving  more  seem  not  to  have 
been  published.  Other  obstacles  are  that  latent  class  sizes  must  be  estimated  or 
knovn  in  advance. 

B.  Minimizing  Within  Cluster  Variation 

One  procedure,  often  proposed,  is  to  subdivide  the  N  profiles  in  K-space 
into  Q  mutually  exclusive  subsets  in  such  a  way  that  each  is  as  compact  and  homo¬ 
geneous  as  possible.  Compactness  is  achieved  by  requiring  the  average  of  all 
distances  between  profiles  within  each  subset  shall  be  a  minimum.  This  technique 
nas  been  variously  labeled  a  "minimum  variance  partition"  and  a  "minimum  squared 
error  technique," 

One  of  the  first  of  such  efforts  was  reported  by  Thorndike  (1953),  His 
procedure  begins  by  assuming  that  the  two  profiles  which  are  the  greatest  distance 
apart  fall  into  different  Subgroups  A  third  Subgroup  is  established  with  a 
profile  which  is  furthest  away  from  either  of  the  other  two  Each  cluster  is 
built  up  by  adding  that  profile  nearest  the  pivot  defining  the  cluster.  A 
profile  is  added  to  each  cluster  in  turn  until  all  specimens  are  assigned.  This 
yields  sets  of  clusters  of  equal  size  Profiles  found  closer  to  members  of 
another  cluster  than  to  their  own  are  re-assigned  until  further  shifts  do  not  reduce 
within-cluster  distances.  Increases  in  the  number  of  clusters  are  made  in  the  same 
manner  until  the  average  within-cluster  distances  relative  to  the  number  of  clusters 
stabilize.  While  the  procedure  is  comparatively  objective  it  has  some  limitations, 
a  few  of  which  will  be  mentioned  For  instance  the  goal  of  assigning  specimens 
so  that  the  average  within-cluster  distances  are  at  a  minimum  involves  a  fair 
degree  of  trial  and  error  and  no  criterion  for  optimal  termination  There  are 
no  limits  set  in  assigning  profiles  close  to  two  clusters;  every  profile  is  allo¬ 
cated  to  a  cluster.  There  also  appears  to  be  no  Justification  for  assigning  every 
profile  to  a  cluster,  not  for  seeeking  subgroups  of  equal  size.  Finally  the 
number  of  groups  must  be  specified  in  advanced. 

Zubin,  Fleisa,  and  Burdock  (1963)  have  proposed  a  procedure  for  fractionating 
a  population  into  homogeneous  Subgroups  that  resembles  Throndlke's.  First  the 
matrix  of  D^'s  is  scanned  and  the  largest  entry  identified.  The  two  profiles 
involved,  say  X  and  Y,  then  form  the  foci  of  two  subgroups  About  each  of  these 
foci  separately  is  clustered  each  profile  whose  Vr  from  the  focus  is  less  than 
the  fifth  centlle  of  all  the  squared  distances.  These  two  clusters  are  taken  as 
nuclei.  Then  abo^t  each  of  these  nuclei  are  clustered  profiles  whose  average 
Jr  from  members  of  the  nucleus  is  less  than  the  tenth  centile  of  all  distances. 

The  criterion  of  inclusion  may  be  relaxed  still  further  until  every  profile  in 
the  sample  has  been  assigned  to  one  of  the  subgroups.  A  profile  that  satisfies 
a  criterion  for  both  clusters  is  assigned  to  the  group  to  which  it  is  closer. 

The  subgroups  are  then  tested  by  chi  square  for  homogeneity  If  the  clusters 
are  not  yet  homogeneous,  the  next  step  is  to  identify  that  trio  of  profiles 
mutually  furthest  apart  from  one  another  than  any  other  triplet  Profiles  are 
again  clustered  about  each  of  these  foci  and  the  homogeneity  of  che  resulting 
subgroups  is  tested  This  procedure  is  continued  either  until  all  groups  are  homo¬ 
geneous  or  the  number  of  groups  to  be  found  is  so  great  as  to  be  meaningless  The 
procedure  assumes  normality  in  the  underlying  grojps,  independent  measures,  and 
equal  covariance  matrices.  The  method  tends  to  guard  against  che  detection  oi 
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spurious  clusters  since  it  allows  for  the  possibility  that  the  population 
studied  is  homogeneous  to  begin  with. 

Forgy  (1965)  has  delineated  some  of  the  shortcomings  of  the  minimum  variation 
technique.  As  an  illustration  he  cites  data  from  the  field  of  astronomy  reported 
by  Hertzsprung  and  Russell.  When  stars  are  plotted  by  absolute  luminosity  and 
temperature  two  "natural"  groups  of  stars  are  evident.  The  so  called  "main 
sequence"  stars  appear  as  a  flat  S  pattern  while  the  "red  giants"  group  together 
in  a  compact  cluster.  A  minimum  variance  partition  of  such  a  sample  could  cut 
right  across  these  groups  since  such  a  partition  would  produce  a  smaller  within- 
group  sum  of  squares.  Thus  the  method  tends  toward  the  arbitrary  partitioning 
of  space  into  "efficient"  subsets.  It  is  unsuited  for  the  recovery  of  natural 
subgroups  differing  in  configuration. 

C.  Successive  Cluster  Buildup 

In  this  technique  either  a  single  pair  of  profiles  (usually  the  closest  pair) 
or  a  profile  with  greatest  variance  is  selected  as  a  nucleus  for  the  cluster. 

Other  profiles  are  assigned  to  the  cluster  on  the  basis  of  a  definition  of  simila¬ 
rity  which  sets  a  limit  or  threshold  for  inclusion.  The  method  does  not  need  to 
specify  the  number  of  clusters  to  be  determined  in  advance. 

McQuitty  (1961,  1963)  has  developed  several  procedures,  called  typal  analysis, 
representative  of  successive  cluster  buildup.  He  defines  a  type  as  a  category  of 
N  people  such  that  everyone  in  the  category  is  more  like  each  of  the  other  N-l 
persons  than  he  is  like  any  other  person  in  any  other  category.  The  method  starts 
with  a  table  of  similarity  Indices  between  people.  The  indices  of  every  column 
are  then  arranged  in  rank  order  and  submatrices  are  built  that  satisfy  the 
definitions  of  type.  A  submatrix  satisfying  the  definition  of  type  contains  no 
rank  larger  than  the  number  of  persons  in  the  type.  Suppose  a  type  consists  of 
persons  A  and  B,  A  being  most  like  A  and  second  most  like  B,  and  B  in  turn  being 
most  like  B  and  second  most  like  A.  Then  the  submatrix  constitutes  a  type  if  it 
contains  no  rank  larger  than  the  number  of  cases.  This  process  continues  until 
all  persons  of  the  original  matrix  have  been  chosen  in  order  of  their  similarity  to 
A.  The  problem  is  to  select  from  the  full  matrix  of  indices  all  of  the  submatrices 
which  fulfill  the  definition  of  a  type.  The  advantages  claimed  for  the  method 
are  that  (a)  it  can  reject  an  hypothesis  of  types;  (b)  it  reports  exceptions  to 
a  type.  If  typal  analysis  fails  to  vield  types  it  is  possible  to  relax  the 
definition  and  permit  inclusion  of  persons  with  slightly  higher  ranks  than  are 
permitted  by  the  usual  definition. 

Sawrey,  Keller,  and  Conger  (1960)  also  have  designed  a  cluster  buildup 
procedure  which  uses  the  distances  (D2's)  between  each  and  every  profile.  First 
an  arbitrary  maximum  is  set  as  a  definition  of  "similarity."  Then  with  each 
profile  are  listed  all  other  profiles  in  the  matrix  whose  distance  is  less  than  the 
maximum.  The  profile  with  the  largest  number  of  other  profiles  similar  to  it  is 
selected  to  form  a  potential  nucleus  group.  The  profile  selected  and  all  those 
similar  to  it  are  crossed  out  from  the  table.  The  profile  with  the  next  highest 
number  of  similar  profiles  is  then  selected  to  become  the  second  potential  nucleus 
group.  Again  the  associated  list  of  profiles  is  crossed  out  from  the  table.  The 
process  is  repeated  until  only  profiles  having  no  similar  profiles  remain.  Next 
a  minimum  value  is  set  for  the  definition  of  "dissimilarity"  and  a  matrix  of  the 
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selected  profile  indices  is  prepared.  The  columns  of  the  matrix  are  summed  and 
dissimilar  pivot  profiles  are  selected.  Selection  proceeds 'from  the  profile 
having  the  largest  sum  to  the  profile  having  the  smallest  sum.  As  a  profile  is 
selected  all  other  profiles  which  are  not  dissimilar  to  it  (i.e.,  whose  distance 
from  the  selected  profile  is  less  than  the  miximum)  are  eliminated  from  the  matrix. 
The  selected  profiles  are  all  at  least  the  minimum  distance  from  each  other. 

The  centroid  of  each  nucleus  group  (the  selected  profile  and  associated  list)  is 
determined.  Each  remaining  profile  is  added  to  a  nucleus  group  if  its  distance 
is  less  thsn  the  limit  of  dissimilarity  from  any  member  of  the  nucleus  group. 

Several  maximum  limits  may  be  aet  for  adding  in  additional  profiles  to  existing 
groups.  Only  an  upper  limit  is  used  to  form  the  nucleus  groups.  Although  dis¬ 
tances  among  members  of  a  cluster  may  vary  greatly,  these  are  ignored.  Several 
maxima  would  appear  needed  to  define  similarity  since  a  subgroup  whose  members 
are  more  widely  separated  from  each  other  and  from  other  groups  will  remain 
unrecognized. 

Saunders  and  Schucman  (1962)  have  developed  a  procedure,  called  syndrome 
analysis,  that  satisfies  McQuitty's  definition  of  type  but  operates  on  squared 
distances  between  profiles.  It  begins  by  regarding  every  individual  in  the  sample 
as  a  cluster  of  order  one.  First,  all  pairs  that  are  mutually  closest  to  each 
other  are  identified.  Then  all  triplets  whose  members  are  closest  to  each  other 
are  found.  Clusters  of  higher  order  are  identified  by  the  same  process  until 
no  more  clusters  appear  by  this  process.  A  list  of  "closed  clusters"  is  examined 
to  eliminate  those  which  are  contained  in  larger  closed  clusters  that  came  to 
light  later  in  the  process.  The  resulting  list  of  non-overlapping  closed  clusters 
are  regarded  as  "nodes"  for  the  given  matrix.  The  third  step  is  to  characterize 
the  nodes.  This  may  Involve  finding  the  mean  profile  of  members  of  each  node, 
or  it  may  involve  construction  of  the  within-node-variance-covariance  matrix  of 
test  scores.  The  latent  roots  and  vectors  of  the  matrix  may  provide  the  necessary 
coefficients  for  partialling  out  intra-node  variability  preparatory  to  iteration 
of  the  procedure.  Once  membership  has  been  established  the  resulting  subset  is 
called  a  syndrome. 

Several  cluster-search  procedures  similar  to  those  just  described  have  been 
developed  by  Lorr  and  his  associates  (Lorr,  et  al,  1962;  Lorr  and  Radhakrishnan, 
1967).  The  procedure  begins  by  finding  a  profile  near  the  center  of  a  cluster. 

The  profile  with  the  maximum  variance  of  squared  correlations  (or  congruency 
coefficients)  with  all  others  is  selected  as  pivot.  To  the  pivot  are  added  suc¬ 
cessively  the  two  profiles  with  the  highest  average  correlation  with  all  profiles 
correlating  above  with  the  pivot.  The  limit  may  be  set  at  the  value  at 
which  a  correlation  coefficient  based  on  K  independent  variates  is  significant  at 
p  less  than  .05.  The  matrix  is  searched  and  the  profile  added  that  correlates 
highest  on  the  average  with  those  already  in  the  cluster.  The  process  continues 
until  no  other  profiles  can  be  found  that  correlate  on  the  average  above  C^. 

Next  an  upper  limit  is  set  to  define  dissimilarity  and  to  prevent  cluster  overlap. 
A  suitable  value  is  a  correlation  coefficient  significant  at  p  less  than  .10.  Any 
coefficient ' in  the  residual  matrix  that  correlates  on  the  average  Cy  or  higher 
with  the  first  cluster  is  deleted.  The  second  cluster  is  generated  in  the  same 
manner  as  the  first  from  the  matrix  of  remaining  profiles.  The  deletion  of  profiles 
correlating  above  C,  with  a  newly  formed  cluster  does  not  exclude  profiles  corre¬ 
lating  above  C..  with  preceeding  clusters.  Accordingly,  cluster  members  that 
correlate  on  tne  average  above  with  the  last  generated  cluster  are  also  deleted. 
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The  final  steps  consist  in  determining  (a)  the  mean  correlations  vithin  and 
between  clusters;  (b)  the  mean  standard  score  profile  of  each  cluster.  The 
computer  program  can  handle  150  profiles  at  one  time. 

Like  McQuitty's  typal  analysis,  the  procedure  proposed  by  Gengerelli  (1963) 
is  based  on  a  definition  of  a  subgroup.  Consider  a  population  of  N  persona  each 
measured  on  K  variates.  Let  each  person  be  represented  as  a  point  in  K-dimensional 
space.  Then  a  subgroup  is  defined  as  an  aggregate  of  points  in  the  teat  space  such 
that  the  distance  between  any  two  points  in  the  set  is  less  than  the  distance 
between  any  point  in  the  set  and  any  point  outside  of  it.  Suppose  N  persons  as 
points  are  distributed  in  three-dimensional  space  as  two  spheres,  A  and  B.  Two 
subsets  will  exist  only  if  the  two  spheres  are  separated  by  a  distance  greater 
than  the  diameter  of  the  larger  sphere.  The  method  begins  with  an  N  by  N  matrix 
of  squared  distances.  A  frequency  distribution  is  made  of  distances  between  all 
possible  pairs.  The  existence  of  one  or  more  discontinuities  in  the  distribution  of 
distances  indicates  that  a  population  consists  of  two  or  more  subsets.  The  first 
point  of  discontinuity  in  the  distributions,  D  ,  provides  a  criterion  for  deter¬ 
mining  the  point  of  separation  between  two  sublets.  A  subset  is  then  defined  as 
the  aggregate  of  points  (persons)  who  are  mutually  no  farther  apart  one  from  another 
than  Dc.  The  existence  of  subsets  in  a  population  is  thus  associated  with  multimod¬ 
ality  in  the  distribution  of  inter-point  distances.  Computer  programs  and  empirical 
tests  are  aa  yet  not  available. 

Bonner  (1964)  has  been  responsible  for  several  programs  for  clustering 
binary  attributes,  one  of  which  has  been  generalised  to  continuous  data  (Pettit, 
1964).  One  program  is  based  on  a  type  definition  resembling  McQuitty's.  The 
goal  is  to  find  clusters  where  all  members  are  similar  to  each  other  and  no  non¬ 
member  is  similar  to  all  members.  The  algorithm  picks  a  random  "center"  and 
builds  a  cluster  around  this  through  use  of  an  arbitrary  threshold  T.  Profiles 
more  similar  to  the  center  than  T  are  considered  to  be  in  the  crude  cluster. 

The  typical  member  of  the  cluster  is  computed  and  compared  with  the  expected 
number  of  clusters  rarer  than  this  to  be  found  in  an  uncorrelated  population. 

Then  by  means  of  a  process  of  "hill  climbing"  a  better  cluster  is  achieved.  All 
profiles  are  used  as  cluster  centers. 

Rogers  and  Tanimoto  (1960)  have  reported  a  computer  program  for  the  classi¬ 
fication  of  plants.  Their  variables  are  binary  and  a  simple  similarity  coefficient 
is  used.  After  a  matrix  of  similarity  coefficients  has  been  obtained  a  value 
Ri  is  computed  as  a  measure  of  the  number  of  nonzero  similarity  coefficients 
possessed  by  a  given  individual.  Next  computed  is  a  quantity  H,  which  is  the 
product  of  all  the  similarity  coefficients  of  j  with  others.  Ail  persons  are 
then  grouped  in  a  table  in  order  of  descending- value  of  R . .  The  person  having 
the  highest  Rj  and  the  highest  H,  is  considered  the  prime  mode.  The  problem  is 
to  find  a  criterion  to  determine Jthe  number  of  persons  who  go  into  a  cluster. 

To  do  this  a  second  node  is  found.  The  radius  around  the  first  node  must  be  such 
as  not  to  include  the  second  node.  At  this  point  the  similarity  coefficients  are 
converted  into  distances  defined  as  D.  equals  -log2  S,..  These  distances  permit 
visualization  of  taxonomic  similarity.  Finally  a  measure  of  cluster  inhomogeneity 
is  computed.  The  method  has  proved  to  be  fairly  effective  in  isolating  subsets 
when  the  variables  are  truly  qualitative  categories. 

Cattell  and  Coulter  (1966)  have  developed  a  procedure  that  represents  a 
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variant  of  cluster  buildup.  Given  a  matrix  of  similarity  indices  the  next 
step  is  to  establish  several  arbitrary  limits  as  definitions  of  similarity. 

The  matrix  of  similarities  is  then  converted  into  an  "incidence"  matrix  of 
ones  and  zeros.  If  an  index  exceeds  the  limit  it  is  categorized  as  a  unit  to 
designate  a  linkage;  otherwise  it  is  categorized  as  a  zero.  Next  a  "phenomenal 
cluster"  is  defined  as  a  set  of  profiles  each  of  which  is  linked  to  every  other. 
Spatially  this  means  that  all  points  fall  within  a  hypersphere.  A  Boolean  algo¬ 
rithm,  based  on  what  has  been  called  "ramifying  linkage  method",  sorts  the  data 
into  phenomenal  clusters. 

D.  Linkage  Analysis 

Linkage  analysis  classifies  profiles  into  clusters  such  that  every  profile 
in  a  cluster  is  more  like  some  other  profile  in  that  cluster  than  it  is  like  any 
other  profile  in  any  other  cluster  (McQuitty,  1957).  This  method  is  especially 
useful  in  determining  elongated,  serpentine  or  amoeboid  clusters.  Profiles  are 
continuously  connected  with  one  another  through  intermediate  profiles  thus  main¬ 
taining  any  specified  level  of  similarity.  Linkage  analysis  has  also  been  much 
applied  to  generate  hierarchies  which  will  be  considered  later. 

McQuitty  (1957,  1964)  has  been  among  the  first  to  develop  linkage  analysis 
which  is  perhaps  the  simplest  of  the  cluster  methods.  The  analysis  starts  with 
a  matrix  of  similarity  indices.  First  the  highest  entry  in  each  column  (a  linkage) 
is  found,  and  then  the  highest  entry  in  the  matrix  is  identified.  The  highest 
entry  (ab)  represents  a  reciprocal  pair  in  the  sense  that  members  are  mutually 
closest  to  each  other.  One  member  of  the  pair  (b)  may  also  be  the  highest  entry 
in  some  other  columns,  say  c  and  d.  Then  c  and  d  also  constitute  members  of  the 
cluster.  If  none  of  the  profiles,  a,  b,  c  and  d  is  highest  in  any  other  column 
the  cluster  is  complete.  The  highest  remaining  entry  in  the  matrix  is  then  used 
to  build  the  next  cluster.  Analagously,  additional  clusters  are  determined. 

Cattell  and  Coulter  (1966)  also  employ  a  procedure  akin  to  linkage  analysis 
to  identify  strung-out  clusters.  Instead  of  beginning  with  individual  profiles 
they  first  identify  a^l  possible  phenomenal  clusters,  (hyperspheres).  The  amount 
of  overlap  of  the  phenomenal  clusters  is  recorded  in  a  matrix  which  is  then  con¬ 
verted,  through  application  of  a  limit,  into  an  incidence  matrix  of  units  and  zeros. 
This  latter  matrix  is  subjected  to  their  search  procedure  which  identifies  all 
mutually  exclusive  chains  of  continuously  related  profiles. 

Needham  (1961)  and  Parker-Rhodes  (1961)  use  linkage  analysis  with  binary 
data.  The  distance  between  all  pairs  of  profiles  is  determined.  A  limit  or  cut¬ 
ting  score  is  set  to  define  similarity  and  applied  to  the  matrix  which  is  reduced 
to  a  matrix  of  zeros  and  ones.  Columns  of  the  matrix  are  then  compared  pair-wise 
to  determine  the  number  of  agreements  or  intersections  between  them.  The  resulting 
subsets,  called  "clumps",  are  defined  as  members  more  like  each  other  and  less 
like  non-im*,iber8  than  numbers  of  the  universe  picked  at  random. 

The  method  of  single  linkage  has  also  been  suggested  by  Sneath  (1957)  and 
applied  to  taxonomic  problems  in  biology  (Sokal  and  Sneath,  1963).  They  point 
out  that  in  avoiding  overlapping  clusters  data  may,  in  fact,  be  distorted  to 
yield  discrete  clusters.  When  single  linkages  are  permitted  then  complicated 
serpentine  clusters  may  be  formed.  More  will  be  said  under  the  topic  of  hierarchical 
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clusters. 

E.  Multidimensional  Scaling 

Metric  and  nonmetric  multidimensional  scaling  represents  another  possible  as 
yet  untried  approach  to  cluster  identification.  Given  an  N  by  N  matrix  of  inter¬ 
person  similarities  some  of  the  standard  routines  developed  by  Torgeraon  (1958), 
Shepherd  (1962),  Kruskal  (1964)  and  Lingoes  (1965,  1966)  could  be  applied.  In 
these  procedures,  individual  profiles  would  be  treated  as  points  in  space  of 
unknown  dimensionality.  The  problem  would  be  to  determine  the  dimensionality 
of  the  space  and  the  location  of  the  points  in  space.  In  the  final  solution 
distances  between  points  in  the  space  correspond  to  some  monotonic  function  of 
the  similarity  of  the  corresponding  profiles.  The  Guttman-Lingoes  procedures  are 
designed  for  the  treatment  of  categorical  qualitative  data  but  are  also  adapted 
for  use  with  quantitative  data. 

F.  Hierarchic  Cluster  Analysis 

Discussion,  thus  far,  has  been  restricted  to  techniques  for  finding  unordered 
qualitative  classes  or  so-called  natural  clusters.  Some  of  the  procedures  described 
have  also  been  applied  or  extended  to  the  problem  of  establishing  discrete  clusters 
each  subdivided  into  subclasses.  While  there  is  some  question  in  regard  to  the 
range  of  application  of  such  methods  to  psychological  problems,  their  use  in  bio¬ 
logy  is  widespread.  Sokal  and  Sneath  (1963)  assert  that  biological  classification 
should  be  constructed  by  nested  overlapping  categories  (p.  192),  Thus  some  of 
the  procedures  for  constructing  hierarchic  structures  will  be  reviewed  briefly. 

Sneath's  (1957)  single  linkage  procedure  is  followed  for  the  first  sets. 

Then  the  criteria  of  admission  (thresholds)  are  gradually  lowered  from  an  initial 
high  similarity  value  to  low  similar!,.*  values.  Thus  a  single  link  between  any 
member  of  two  clusters  permits  the  establishment  of  a  more  inclusive  cluster. 

McQuitty  has  been  responsible  for  numerous  procedures  for  hierarchical  class¬ 
ification  (1954,  1960,  1964).  Agreement  analysis  classifies  objects  into  succes¬ 
sive  levels  such  as  species,  genera,  and  families.  The  first  species  is  the  two 
objects  with  the  highest  agreement  score,  the  second  species  are  the  two  objects 
with  the  next  highest  score.  Species  are  then  classified  into  more  inclusive 
groups  analogous  to  the  way  in  which  individuals  were  classified.  Hierarchical 
linkage  analysis  seeks  to  classify  individuals  into  categories  such  that  every 
member  of  every  category  has  a  maximal  number  of  common  characteristics  and  a  minimal 
number  of  categories  are  required.  Later  modifications  have  led  to  what  is  called 
hierarchical  classification  by  reciprocal  pairs  and  by  typal  analysis. 

Ward  (1963)  and  Ward  and  Hook  (1963)  have  developed  a  very  efficient  minimum- 
within-group  distance  procedure  for  hierarchical  grouping  of  profiles.  Each 
larger  group  is  a  unique  combination  of  the  next  subordinate  subgroup.  The 
technique  operates  on  an  N  by  N  matrix  of  profile  distances.  Clusters  are  built 
up  by  adding  cases  which  increase  the  mean  within  squared  distance  least.  Clust¬ 
ering  starts  with  N  groups  of  one  and  ends  with  one  group  of  N.  Initially  the 
matrix  is  scanned  to  find  the  pair  of  profiles  with  the  smallest  distance,  these 
are  combined  to  form  a  cluster  of  two.  The  distances  of  the  remaining  profiles 
from  this  cluster  centroid  are  then  computed.  The  process  continues  by  reducing 
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the  number  of  clusters  from  N  (the  original  number)  to  N-l,  N-2,...etc.  at  each 
stage  the  withln-groups  sums  of  squares  is  minimized.  In  addition  they  utilize 
an  "objective  function"  which  reflects  the  investigators  purpose  to  guide  the 
process. 

The  Ward  technique  assumes  nothing  about  the  underlying  structure  of  the  groups 
or  their  distributions.  In  fact  it  can  partition  any  collection  of  profiles 
whether  or  not  it  contains  "natural"  groups.  The  multivariate  distribution  may 
even  be  multivariate  normal  and  thus  unlmodal.  They  offer  no  statistical  test  as 
to  how  many  groups  are  present.  It  is  also  likely  that  the  nature  of  the  groups 
established  may  depend  on  chance  variations  in  dats.  Many  similar  comments 
can  be  made  relative  to  the  McQuitty  techniques  although  they  tend  to  be  set- 
theoretic  in  form.  On  the  other  hand  it  can  be  argued  that  these  procedures 
are  in  fact  quite  useful.  They  group  jobs  so  as  to  reduce  cross-training  time, 
they  facilitate  retrieval  of  information,  and  they  Increase  predictive  efficiency. 

Edwards  and  Cavalll-Sforza  (1965)  also  apply  the  minimum-wlthln-cluster 
sums  of  squares  technique  to  construct  hierarchic  arrangements  of  clusters.  The 
profiles  are  divided  into  the  two  most  compact  clusters,  and  the  process  is 
repeated  sequentially  so  that  a  tree  diagram  is  formed.  The  advantage  of  a  tree 
representation  is  that  it  can  be  mapped  on  paper  in  two  dimensions.  Beginning 
at  the  base  of  the  tree  the  first  bifurcation  represents  the  first  split  of 
profiles  into  two  clusters.  Each  branch  is  split  again  as  the  two  clusters  are 
resolved  into  two  more,  and  the  process  continued  until  individual  points  are 
reached. 
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A  review  of  clustering  methods  in  biological  taxonomy^- 

Robert  R.  Sokal 
The  University  of  Kansas 

Introduction 


If  I  interpret  my  task  this  morning  correctly,  it  is  to  present  to  an 
audience  composed  largely  of  psychologists  and  other  social  scientists  a 
review  of  the  clustering  methods  which  biological  taxonomists  have  employed 
in  recent  years.  There  has  bean  considerable  activity  in  this  field  which, 
as  many  of  you  know,  has  come  to  be  called  numerical  taxonomy.  Although  we 
biologists  are  newcomers  in  this  field  compared  to  the  social  scientists 
we  have  managed  to  accumulate  a  variety  of  methods  in  relatively  few  years. 

So,  I  could,  in  fact,  report  on  a  substantial  number  of  different  clustering 
approaches.  However,  I  shall  only  sketch  in  scant  outlines,  since  with  one 
or  two  minor  exceptions  the  techniques  in  biology  are  fundamentally  akin  to 
those  of  the  social  sciences  (Ball,  1965),  and  theie  seems  little  point  in 
reintroducing  you  to  methods  long  familiar  buu  disguised  in  biological  garb. 

You  have  undoubtedly  been  struck  by  the  wide  generality  of  your 
approaches  across  other  disciplines  of  science.  But  it  is  important  not  to 
be  overly  impressed  by  this  phenomenon.  There  are,  in  fact,  fundamental 
differences,  not  in  the  mechanics  of  cluster  analysis,  but  in  the  philosophical 
assumptions  accompanying  its  use,  which  differ  markedly  among  the  various 
fields  of  application.  And  I  hope  to  spend  the  greater  part  of  my  time 
explaining  to  you  the  bases  of  these  assumptions  in  biology  to  permit  you 
to  contrast  these  with  the  assumptions  upon  which  you  have  been  basing  your 
work.  I  feel  that  such  an  approach  should  be  of  interest  to  you.  Through 
an  appreciation  of  the  differences  in  approach  in  clustering  philosophy  in 
other  sciences  I  have  gained  more  insight  into  my  own  research  field  and 
possibly  similar  benefits  may  accrue  to  you  from  such  a  comparative  approach. 

Principles  of  taxonomy 

Before  we  proceed  we  should  define  taxon  as  meaning  a  taxonomic  group 
or  class  of  any  nature  and  rank.  Operational  taxonomic  units  (OTU's)  are 
the  lowest  ranking  taxa  in  a  given  study.  They  are  the  basic  units  that 
are  to  be  grouped  into  higher  ranking  taxa.  a  character  is  a  property  or 
feature  which  varies  from  one  OTU  to  another.  It  is  coded  into  distinguish¬ 
able  states.  Thus  hairiness  of  a  leaf  is  a  character.  Blight,  medium  and 
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heavy  ray  b9  the  three  states  in  which  this  character  occurs  among  the  OTU's 
to  b-'  classified,  a  possible  source  of  confusion  is  the  difference  between 
the  terms  •'classification"  and  '•identification."  Where  a  set  of  unordered 
objects  has  been  grouped  on  the  basis  of  like  properties,  biologists  call 
this  •'classification."  Once  a  classification  has  been  established,  the 
allocation  of  additional  unidentified  objects  to  the  correct  class  is 
generally  known  as  •'identification."  dome  mathematicians  and  philosophies 
would  also  call  this  second  process  "classification,"  but  I  am  principally 
concerned  with  classification  in  the  biologist's  sense. 

Fundamental  criteria  of  a  classification  have  been  defined  by  Williams 
snd  Dale  (1965)  who  state  that  for  a  grouping  of  OTU's  to  be  considered  a 
classification  three  requirements  must  be  met  (paraphrased  for  biological 
taxonomy):  (1)  Within  every  taxon  containing  more  than  one  OTU  there  must 
be,  for  every  OTJ,  at  least  another  OTU  with  which  it  shares  minimally  one 
relevant  character  state.  (2)  Membership  in  the  taxon  may  not  itself  be  a 
relevant  character.  (3)  Every  OTU  in  any  one  taxon  must  differ  in  at 
least  one  relevant  character  sta+n  from  every  OTU  in  every  other  taxon. 

We  must  also  distinguish  between  taxa  (plural  of  taxon)  and  categories. 

Taxa  are  actual  groupings  observed  in  nature,  regardless  of  the  basis  on 
which  the  grouping  has  been  done.  They  are  allocated  to  categories  which 
are  the  hierarchic  levels  in  a  classificatory  scheme.  Thus  Homo  sapiens, 
carnivores  or  mammals  are  taxa,  while  species,  genera  or  families  are 
categories. 

Most  classifications  are  internal  (Williams  and  Dale,  1965)  by  which 
is  meant  that  the  classification  is  based  upon  criteria  entirely  inherent 
within  the  data  that  are  to  be  classified.  By  contrast,  there  are  external 
classificatory  procedures  in  which  certain  reference  taxa  are  employed  in 
aiding  in  the  classification.  An  example  in  point  is  the  non-Linnean 
t?." crony  of  DuPraw  (1964)  which  employs  discriminant  functions  including 
both  known  and  unknown  specimens  mapped  in  a  two-dimensional  space  by 
discriminant  analysis. 

While  classifications  in  psychology  and  the  social  sciences  need  not 
always  be  hierarchically  structured,  the  principal  purpose  of  biological 
numerical,  taxonomy  is  to  group  organisms  into  a  hierarchic  system  of 
biological  taxa.  There  arc  two  million  different  species  of  living  organisms 
in  tlu  world.  Those  rust  be  grouped  if  only  for  convenience  of  creating 
order  in  a  chaos  of  names  and  forms,  but  also  because  a  sound  taxonomic 
system  will  reveal  much  that  is  useful  and  of  interest  about  the  evolutionary 
mechanisrs  that  have  given  rise  to  the  diversity  of  kinds  of  organisms 
existing  in  the  -’orld  today.  The  principle  of  biological  evolution  is 
fundamental  to  cn  understanding  of  the  nature  of  biological  taxa  and  the 
discontinuities  among  them.  This  i3  reflected  in  the  commonly  accepted 
belief  that  there  is  just  one  'natural-  system  which,  if  only  found,  would 
be  the  obvious  classification  of  the  group  under  study.  Traditionally 
this  natural  classification  has  always  been  an  evolutionary  one.  Presumably 
the  organisms  constituting  a  taxon  are  related  by  common  descent.  If  we 
could  only  go  back  in  the  fossil  record  of  a  natural  group,  we  would  find 
a  common  ancestor  for  them  before  encountering  a  common  ancestor  for  these 
forms  and  those  in  another  taxon  of  equal  rank.  However,  it  has  been 
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enphasized  in  recent  years  that  there  are  at  least  two  fundamental  kinds  of 
relationships  among  taxonomic  units,  phene tic  relationships  which  are  based 
on  overall  similarity  in  terms  of  the  characteristics  which  are  measured, 
and  cladlstic  relationships  based  on  common  descent  as  described  above. 

Most  conventionally  stated  taxonomic  relationships  contain  an  undefined 
mixture  of  the  two  (Sokal  and  Gamin,  1965).  Naturalness  in  a  phenetic 
sense  is  understood  to  mean  maximal  overall  similarity  within  a  taxon  as 
contrasted  with  substantial  differences  from  other  taxa. 

Thus,  in  attempting  to  set  up  a  natural  system  we  have  to  say  whether 
it  is  natural  in  a  phenetic  or  a  cladistic  sense,  host  of  the  work  in 
numerical  taxonor^y  so  far  has  dealt  with  phenetic  systems,  taxonomies  based 
on  overall  similarity  which  may  or  may  not  reflect  closeness  of  evolutionary 
relationship.  These  systems  are  of  general  utility.  Phenetic  taxa  in  a 
natural  system  should  be  cohesive  and  have  a  high  predictive  value  for 
characters  other  than  those  upon  which  the  taxonomy  has  been  based.  This 
brings  up  the  problem  of  character  selection,  which  does  not  loom  as  large 
in  sociology  and  psychology,  because  only  characters  of  interest  are  chosen. 
Thus,  if  we  want  to  classify  individuals  on  the  basis  of  their  attitudes  to 
drinking,  we  might  only  classify  them  on  responses  related  to  this  variable, 
but  would  not  necessarily  classify  them  on  their  physiology,  their  attitudes 
to  art,  or  their  driving  habits.  The  question  is  whether  there  are  natural 
taxa  of  personality  types  rather  than  different,  partially  intersecting 
facets  of  the  personality.  In  biology  we  wish  to  represent  as  fairly  and 
exhaustively  as  we  can  the  genetic  structure  of  the  individual  populations 
under  study,  and  this  leads  to  serious  problems  of  character  selection  as 
we  shall  see. 

Fundamental  to  the  establishment  of  any  taxonomy  is  the  decision  on 
whether  taxa  are  to  be  monothetic  or  polvthetic.  a  monothetic  group  is 
defined  by  the  possession  of  a  unique  set  of  features,  and  classification 
on  monothetic  principles  is  a  series  of  successive  logical  divisions  into 
ever  smaller  subsets  sharing  one  or  more  states  of  a  character.  By  contrast, 
a  polythetic  classification  places  together  organisms  that  have  the  greatest 
number  of  shared  features.  No  single  feature  is  either  essential  to  group 
membership  or  is  sufficient  to  make  an  organism  a  member  of  this  group. 

Similarity  coefficients 

Any  consideration  of  clustering  methods  must  concern  itself  with  the 
nature  of  the  data  to  be  clustered,  a  few  of  the  methods  extract  structure 
directly  from  the  original  data  matrix,  which  is  a  rectangular  matrix  whose 
columns  are  operational  taxonomic  units  (the  OTU's  to  be  clustered)  and  whose 
rows  are  the  characters  on  the  basis  of  which  the  clustering  proceeds.  The 
characters  are  coded  numerically  into  a  number  of  states  or  as  a  continuous 
function.  In  the  majority  of  cases  we  first  compute  from  the  data  matrix 
a  matrix  of  similarity  coefficients,  which  expresses  the  pair-wise  relation¬ 
ships  among  all  the  OTU's  of  the  study.  These  coefficients  of  similarity 
are  of  three  basic  kinds — coefficients  of  association,  which  in  some  way 
express  the  measure  of  agreement  in  character  states  that  actually  exists 
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between  any  pair  of  OTU's  as  a  proportion  of  the  total  amount  of  agreement 
that  could  exist;  correlation  coefficients  among  OTU's,  based  on  the 
characters  of  the  data  matrix  (this  is  the  conventional  Q-type  analysis  of 
the  psychologists);  and  a  measure  of  Euclidian  distance  between  OTU's  in  a 
character-space.  For  purposes  of  this  discussion  I  shall  confine  myself  to 
a  discussion  of  correlation  and  distance  coefficients  with  which  I  have  had 
most  experience.  Many  of  the  association  coefficients  can  be  transformed 
to  distances  or  functions  thereof.  An  important  consideration  first  pointed  - 
out  by  Williams  and  Dale  (1965)  1*  that  while  studies  of  the  relationships 
among  OTU's,  whether  measured  as  correlations  or  as  distances  have  both 
been  termed  Q-studies  following  the  lead  of  the  psychometricians,  there  is 
a  profound  difference  between  these,  a  matrix  of  correlations  between 
pairs  of  OTU's  represents  angles  among  OTU's  in  a  space  whose  dimensions 
represent  the  OTU's.  Thus,  there  are  maximally  as  many  dimensions  as  there 
are  OTU's.  On  the  other  hand,  a  distance  matrix  among  pairs  of  OTU's,  while 
also  a  Q-study,  shows  distances  among  OTU's  imbedded  in  a  character  space. 
That  is,  the  dimensions  of  the  hyperspace  represent  the  separate 
characters,  or,  seen  in  the  three-dimensional  representations  of  OTU's  which 
we  have  been  preparing  for  purposes  of  study  and  analysis,  these  three 
dimensions  represent  linear  combinations  of  the  characters  (three  eigen¬ 
vectors  corresponding  to  the  three  largest  eigenvalues  of  the  character 
correlation  matrix).  Conversely,  correlations  among  characters  would 
represent  angles  in  a  character  space.  Distances  among  characters  are  not 
generally  computed,  but  if  they  were,  they  would  be  imbedded  in  a  space 
whose  dimensions  were  the  individuals  of  the  study.  Williams  and  Dale 
(1965)  have  called  the  character  space  an  A-space  (from  attribute  space) 
while  the  space  whose  dimensions  represent  the  OTU's  has  been  called  an 
I -space  (from  Individual  space). 

Several  characteristics  of  the  similarity  coefficients  profoundly 
affect  the  clustering  methods.  The  similarity  function  should  be  metric, 
that  is,  it  should  meet  the  requirements  of  symmetry,  the  triangle  inequality, 
and  should  be  non-zero  for  nonidentical  elements  and  zero  for  identical 
ones.  Most  coefficients  proposed  in  numerical  taxonomy  have  been  metric. 

Some  semimetric  and  asymmetric  similarity  coefficients  have  been  proposed 
in  numerical  taxonomy  and  in  some  instances  such  as  immunological  similarity 
may  be  justified.  However,  such  coefficients  greatly  complicate  the 
clustering  and  analysis  of  the  OTU's. 

General  considerations  of  the  relations  among  the  similarity  coefficients 
are  in  order.  For  instance,  since  the  taxonomic  relations  resulting  from  the 
cluster  analysis  are  to  be  in  the  nature  of  universals,  it  is  important  that 
one-to-one  relations  between  these  coefficients  be  established,  although,  of 
course,  these  coefficients  cannot  be  linear  functions  of  each  other;  other¬ 
wise,  there  would  be  little  point  in  preferring  one  over  the  other.  One 
would  at  least  hope  that  monotonicity  of  the  similarity  function  is  retained. 
In  fact,  however,  it  can  be  easi.y  demonstrated  that  the  various  similarity 
functions  so  far  employed  in  numerical  taxonomy  are  not  jointly  monotonic. 
Decisions,  therefore,  have  to  be  taken  upon  the  choice  of  coefficients,  based 
partly  on  the  model  of  the  type  of  similarity  which  it  is  desired  to  portray 
and  partly  on  the  mathematical  properties  of  the  coefficients. 
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Another  important  consideration  of  a  classification  is  stressed  by 
Williams  and  Dale  (1965).  It  is  not  necessarily  true  that  a  given  similarity 
function  used  to  set  up  a  classification  at  the  lower  hierarchic  level  will 
decrease  (or  increase)  monotonically  as  we  ascend  the  hierarchy,  an  example 
of  this  is  the  Spearman's  sums  of  variables  method  which  frequently  leads  to 
reversals  in  the  value  of  the  correlation  coefficient  when  clusters  join, 
as  noted  by  Sokal  and  Michener  (1958).  Furthermore,  in  certain  types  of 
approaches  the  consequential  nested  hierarchies  are  not  retained  and  several 
members  of  a  subset  at  a  low  hierarchic  level  may  split  up  to  become 
members  of  different  sets  at  a  higher  hierarchic  level.  Such  relationships 
have  been  observed,  among  others,  by  Rubin  (1966)  in  his  optimal  taxonomy 
program. 

One  decision  that  must  be  made  is  whether  a  similarity  index  is  to  be 
constructed  which  will  indicate  what  is  most  similar  to  the  human  observer 
or  whether  such  an  index  can  measure  what  might  be  described  as  the 
intrinsic  similarity  between  two  objects  based  on  their  component  parts, 
this  latter  similarity  not  necessarily  congruent  with  the  one  apparent  to 
the  observer. 


Clustering  methods 

The  three  main  clustering  methods  employed  in  biology  have  been  the 
methods  described  as  linkage  methods  oy  Sokal  and  Sneath  (1963).  In  all  of 
these  methods  the  criterion  for  joining  is  gradually  lowered  from  an  initial, 
high  similarity  value  at  which  all  OTJ's  are  represented  by  a  disjoint 
partition  (single  OTU's  in  a  subset)  to  low  similarity  values  at  which  the 
classification  is  represented  by  a  conjoint  partition  (all  OTU's  are  in  the 
same  taxon).  Single  linkage  described  by  Sneath  (1957)  permits  a  single 
linkage  between  an  OTU  and  a  cluster  or  between  two  clusters  to  establish  a 
new,  more  inclusive  cluster.  While  two  clusters  may  be  linked  by  the  single 
linkage  technique  on  the  basis  of  a  single  bond,  many  of  the  members  of  the 
two  clusters  may  be  quite  far  removed  from  each  other.  To  overcome  this 
difficulty,  Sneath  has  recommended  recalculating  mean  similarity  values 
both  within  and  between  groups  (see  dokal  and  Sneath,  1963»  page  181). 

Wirth,  Estabrook  and  Rogers  (1966)  use  graph  theoretical  techniques  and 
representation  to  carry  out  what  is  essentially  a  single  linkage  method. 
Clustering  by  complete  linkage  requires  that  a  given  OTU  or  a  cluster 
joining  another  cluster  at  a  certain  similarity  coefficient  3  must  have 
relations  at  that  level  or  above  with  every  member  of  the  cluster  to  be 
joined.  This  yields  compact  and  conservative  clusters  compared  to  the  long, 
strung-out  classifications  of  single  linkage.  The  average  linkage  method 
calculates  average  similarities  of  clusters  with  prospective  joiners  and 
since  its  initial  development  by  Sokal  and  Michener  U958)  classifications 
based  on  it  and  on  its  various  modifications  have  demonstrated  higher 
cophenetic  correlations  with  the  original  similarity  coefficients  than 
classifications  based  on  other  clustering  methods. 

Lockhart  and  Hartman  (1963)  have  developed  a  technique  for  successively 
subdividing  large  numbers  of  bacterial  species  into  groups  by  monothetic 
criteria.  Their  results  were,  in  effect,  similar  to  those  obtained  by 
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polythetic  methocb.  The  method  by  Camin  and  Jokal  (1965)  for  clustering 
OTU’s  in  preparation  for  cladistic  analysis  is  another  modified  method. 

.Several  studios  are  now  available  comparing  different  methods  of  clustering 
(see  Lange,  Jtenhouse  and  Offler,  1?65;  .Williams,  Lamoert  and  Lance,  1966; 
and  Jokal  and  iiichcner,  1967).  Without  discussing  these  studies  in  detail, 
we  can  summarize  them  by  stating  that  different  similarity  coefficients  as 
well  as  different  clustering  operations  yield  appreciably  different 
phenograms  from  the  same  data.  Jokal  and  dichener  (196?)  conclude  that 
•As  to  clustering  procedures  all  the  different  methods  tried  produce 
somewhat  different  results.  .  .  . 

•‘It  is  becoming  clear  that  tne  procedures  for  clustering  OTU's  will 
need  considerable  scrutiny  and  improvement  if  the  aim  of  achieving  stability 
in  classification  .5  to  be  realized,  oach  of  the  methods  of  clustering 
so  far  tends  to  fcirs  the*  resulting  clusters  in  certain  ways.  Thus,  for 
example,  the  v  :xghtcd  pair-gi-oup  method  with  arithmetic  averages  assumes 
that  OTU's  occur  in  nested,  dendritic  clusters.  It  will  best  cluster  OTU's 
from  a  similarity  matrix  which  does  in  fact  have  such  phenetic  relation¬ 
ships  aid  it  will  tend  to  impose  dendritic  relationships  upon  data  that  are 
not  markedly  dendritic.  The  degree  to  which  the  phenogran  reflects  the 
similarity  matrix  (cophenetic  correlation)  must  indicate  tne  degree  to 
which  the  clusteiing  method  represents  the  underlying  structure  among  the 
OTU's.  It  is  therefore  important  to  investigate  this  structure  by  a  variety 
o'”  techniques  end  to  ascertain  the  nature  of  the  phenetic  constellations 
of  OTU’s  in  different  taxonomic  groups.  Given  an  understanding  of  the 
phcr.-tic  structure  of  a  taxonomic  group,  it  should  be  possible  to  recommend 
an  appropriate  clustering  method  for  it.  Mo  one  clustering  method  is  likely 
to  serve  well  in  every  instance.  To  give  an  extreme  example,  members  of  a 
continuous  cline  clearly  would  not  be  appropriately  clustered  oy  any  of  the 
over-go  linkage  methods  •• 

A  major  unresolved  problem  of  cluster  analysis  in  biology  is  the  fact 
that  few,  if  any,  clustering  methods  have  been  devised  which  do  not  in  some 
way  bias  the  resulting  classification.  The  average  linkage  method  will 
attempt  to  give  best  results  with  hype rspheroi dal  clusters  separated  by 
substantial  gaps,  single  Linkage  does  well  with  strung-out  data,  and  so 
forth.  Moreover,  these  clustering  methods  tend  to  bias  tne  resulting 
structures  ’r.  the  direction  implied  by  the  clustering  procedure.  It  is, 
therefore,  rf  c*. miserable  importance  to  try  to  establish  general  clustering 
procodu .us  \..c  a  ;Qgonthm  would  vary  depending  on  the  scatter  and  distribution 
of  the  OTU’1  be  clustered  Thus,  if  the  OTU’s  are  in  fact  spheroidally 
clustered  t’ e  aveiaga  linkage  procedure  might  well  be  used.  If,  on  the  other 
hand,  more  couple:,  "hnr.cs  such  as  hypeserpentines,  hype rdumDe  11s,  hyper- 
dougnnuts,  or  even  hyperfisurs -de-lys  are  closer  to  a  representation  of  the 
essential  distribu  ion  of  the  points  in  nyperspace,  then  the  clustering 
program  .-ho  lid  adjust  itself  to  such  patterns.  Juch  self-adjusting  programs 
are  still  not  extensively  developed,  but  it  seems  to  me  that  we  shall  not 
be  representing  ncturc.  faithfully,  nor  learr.  much  about  the  forces  that  have 
resulted  in  the  phenetic  patterns  being  observed,  -unless  we  produce  programs 
of  this  sort.  Rohlf  (1967)  has  developed  a  clustering  procedure  whicn 
departs  from  the  conventional  hyperspheroid  by  allowing  hyperellipsoid 
clusters,  reaching  out  farther  in  some  directions  away  from  the  center  than 
in  others 
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To  avoid  the  distortions  necessary  by  the  two-dimensional  representation 
of  phenograms,  numerical  taxonomists  have  recently  turned  increasingly  to 
other  means  of  representation  of  taxonomic  relationships,  among  the  most 
popular  is  the  three-dimensional  plotting  of  OTU's  either  as  models  or  in 
two-dimensional  perspectives.  In  such  plots,  the  dimensions  usually  represent 
the  largest  three  eigenvectors  from  the  character  correlations  and  are  thus 
linear  combinations  of  the  characters.  It  has  been  our  experience  that  the 
first  three  factors  usually  extract  50-70  per  cent  of  the  overall  variance. 
However,  the  cophenetic  correlations  (see  below)  between  distances  in  the 
resulting  three-space  and  the  original  similarity  matrix  are  always  above 
0.90.  ouch  representation  leaves,  of  course,  the  actual  categorization 
unresolved,  and  methods  will  have  to  be  developed  for  handling  such  problems. 
Most  recently  Rohlf  (196?)  has  developed  a  method  for  representing  taxa  in 
stereograms  which  give  the  illusion  of  three-dimensional  projection  when 
examined  with  stereoscopic  glasses. 

dome  other  considerations 


An  important  question  related  to  choice  of  characters  is  how  many  and 
which  characters  to  chose  to  establish  a  stable  natural  classification. 
Numerical  taxonomists  have  maintained  that  as  the  number  of  characters 
employed  increases  an  asymptote  of  information  is  reached,  and  that  equal 
increments  in  numbers  of  characters  employed  will  provide  decreasing 
perturbations  of  the  taxonomy.  This  seems  obvious  from  a  statistical  point 
of  view  if  we  can  conceive  of  the  characters  as  randomly  selected  from  an 
infinite  population  of  possible  characteristics  measuring  similarity  among 
a  given  pair  of  OTU's.  Experiments  are  under  way  to  test  this  hypothesis, 
and  we  are  not  yet  in  a  position  to  render  final  judgment  upon  it.  This 
line  of  argument  leads,  however,  to  a  position  where  each  sample  of  characters 
in  a  taxonomic  study  is  considered  equivalent  to  every  other  sample  of 
characters,  both  from  the  point  of  view  of  importance  (the  assumption  of 
equal  weighting  of  characters  in  expressing  similarity)  as  well  as  from  the 
point  of  view  of  providing  equivalent  information  about  similarities.  This 
latter  point  is  important,  because  it  assumes  that  regardless  of  what  sets 
of  characters  we  chose,  be  these  external  or  internal  morphological  characters 
as  well  as  biochemical  or  physiological  characters,  we  should  be  able  to 
obtain  identical  taxonomies.  Investigations  of  this  hypothesis  of  non¬ 
specificity  by  Rohlf  (1963)  and  Idchener  and  iokal  (1966)  have  shown  that 
different  sets  of  characters  will  yield  similar  but  not  identical 
classifications,  measures  of  the  replicability  of  the  classification 
yielding  cophenetic  correlations  between  OAZ  and  0.85. 

Results  from  these  studies  as  well  as  from  another  study  in  which 
independent  investigators  reclassified  identical  sets  of  objects  lead  to  the 
recognition  of  what  Rohlf  has  called  the  uncertainty  principle  in  taxonomy. 
This  simply  states  that  it  is  impossible  to  reclassify  by  conventional  or 
numerical  means  the  same  set  of  organisms  and  obtain  comparable  results 
beyond  a  certain  degree  of  replicability.  The  resemblance  among  successive 
classifications  may  be  very  great  (cophenetic  correlations  on  the  order  of 
0.85).  On  the  other  hand,  the  uncertainty  may  be  considerably  greater.  Our 
experience  in  this  field  has  not  yet  been  sufficient  to  indicate  between 
which  bounds  this  uncertainty  may  lie. 


3.08 


On  what  criterion  can  a  classification  be  judged?  In  the  early  days  of 
numerical  taxonomy,  the  success  of  a  numerical  classification  was  generally 
judged  by  the  similarity  of  the  outcome  to  those  classifications  established 
by  conventional  means.  As  the  subject  developed,  there  seemed  no  inherent 
reason  why  the  traditional,  somewhat  intuitive,  classifications  should  be 
considered  as  the  final  arbiter,  and  attempts  were  made  to  develop  internally 
sufficient  criteria  for  the  goodness  of  a  classification.  Two  main  approaches 
have  been  followed.  Jokal  and  Rohlf  (1962)  have  used  the  method  of  cophenetic 
correlation  which  consists  of  correlating  the  original  similarity  matrix  with 
so-called  cophenetic  values  which  are  the  values  of  similarity  implied  by  the 
structure  of  a  given  classificatory  phenogram.  Phenograms  are  two-dimensional 
representations  of  taxonomic  structure  in  terms  of  trees  with  the  axis 
parallel  to  the  stem  of  the  tree  representing  phenetic  similarity.  Because 
phenograms  collapse  multidimensional  relationships  into  two  dimensions,  there 
is  appreciable  distortion  of  the  original  relationships  as  shown  in  the 
similarity  matrix.  The  goodness  of  a  classification  can  now  be  measured  as 
magnitude  of  the  correlation  between  a  phonogram  and  the  original  similarity 
matrix.  It  is,  of  course,  desired  that  the  phenogram  represent  as  much  as 
possible  the  phenetic  similarity  as  shown  in  the  similarity  matrix.  Of  two 
taxonomic  representations  based  on  the  same  similarity  matrix,  that  with  the 
higher  cophenetic  correlation  is  to  be  preferred.  A  method  recently 
developed  by  Rohlf  (1967)  permits  the  moving  of  some  of  the  branches  by  a 
trial-and-error  basis  into  positions  yielding  higher  cophenetic  correlations. 
However,  this  procedure  is  not  yet  practical  for  very  large  matrices, 
except  on  exceedingly  fast  computers. 

Rubin  (1966)  has  approached  the  subject  from  the  general  point  of  view 
of  establishing  a  stability  function  for  a  given  classification,  which  is 
to  be  a  measure  of  the  homogeneity  within  groups  and  the  inhomogeneity  among 
groups  at  a  given  hierarchic  level.  Once  such  a  function  can  be  defined, 
one  obviously  wishes  to  maximize  it,  that  is,  one  wishes  to  arrange  the  OTU's 
within  a  classification  in  such  a  way  that  the  function  becomes  maximized. 

Since  any  given  classificatory  procedure  will  not  result  in  maximization  of 
the  function,  rearrangement  of  the  OTU's  among  the  classes  to  yield  an 
improved  classification  can  be  attempted  b>  a  variety  of  algorithms.  Rubin's 
hill-climbing  algorithm  proceeds  to  follow  up  improvements  of  his  stability 
criterion.  In  fact,  once  such  a  criterion  for  stability  or  goodness  of  a 
classification  is  accepted,  then  almost  any  randomly  chosen  classification 
of  objects  can  be  successively  improved  by  a  series  of  iterative  steps 
yielding  successively  higher  criteria. 

Of  special  interest  are  some  types  of  self-adjusting  clustering  methods 
which  have  been  described  in  the  literature.  These  include  conceptually 
simple,  but  computationally  complex  methods  such  as  curves  derived  from 
scattered  points  which  represent  the  essential  trends  of  these  points 
(dneath,  1966) .  Other  techniques  seek  by  some  method  of  cluster  analysis 
to  classify  a  series  of  OTU's  and  subsequently,  using  each  OTU  as  an  improve¬ 
ment  of  the  previous  classification,  allocating  it  to  previously  established 
classes  unless  it  would  seriously  disagree  with  the  established  classificatory 
scheme  (Ornstein,  1965).  This  scheme  is  essentially  a  “learning"  classification 
program  improving  its  performance  for  a  given  set  of  data  after  having 
initially  classified  a  certain  number. 
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A  Mutual  Development  of  Theory  and  Method  in 
Objective  Analysis  of  Personality  Structure 

Louis  L.  McQuitty 
Michigan  State  University 

In  September  of  1 965  I  summarized  my  approaches  up  to  that  time  under 
the  title  "A  Mutual  Development  of  Some  Typological  Theories  and  Pattern- 
Analytic  Method^"(McQui tty ,  1967). 

I  wish  now  to  review  more  recent  developments  in  my  approaches.  These 
are  a  continuation  of  the  earlier  approaches  and  arc  introduced  here  by 
summarizing  my  position  in  both  theory  and  methods  as  of  the  close  of  my 
earlier  review  paper. 


A  Brief  Review 


General 


My  general  approach  is  to  develop  methods  of  analysis  out  of  theories 
of  personality  structure.  Applications  of  methods  to  data  serve  as  hypotheses 
for  testing  theory.  They  lead  to  the  revision  of  theory  and  the  development 
of  new  methods.  Through  this  approach,  I  attempt  to  develop  both  better 
theory  and  improved  methods. 

My  theoretical  position  as  of  September  1965  was  as  follows: 

"(I)  fvery  person  is  an  'imperfect'  type  as  distinct  from  a  'pure' 

type;  only  'imperfect'  types  exist  in  reality,  and  'pure'  types 
exist  only  in  theory. 

(2)  There  are  fewer  'pure'  types  than  'imperfect'  types;  each  'pure' 
type  is  represented  in  reality  by  two  or  more  'imperfect'  types. 

(3)  The  characteristics  of  'pure'  types  are  approached  but  never 
quite  realized  by  classifying  'imperfect'  types  into  internally- 
consistent  categories,  and  determining  their  common  characteristics. 

The  validity  of  representation  of  a  'pure'  type  generally  increases 
as  the  number  of  'imperfect'  types  representing  it  increases. 

(4)  'Hierarchical'  types  include  all  of  the  types  realized  in 
classifying  'imperfect'  types  into  larger  and  larger,  internal ly- 
consistent  categories;  they  are  the  types  intermediate  between 

those  of  reality  and  theory,  'imperfect'  and  'pure'  "  (McQuitty,  1966a). 

A  category  of  persons  is  said  to  exemplify  a  statistical  type  if  everyone 
in  the  category  is  more  like  every  other  person  in  the  category  than  he  is 
like  any  person  in  any  other  category. 

In  converting  the  theory  of  types  to  a  method  of  analysis,  persons  are 
described  by  patterns  of  characteristics  which  they  possess.  An  index  is 
computed  showing  the  degree  of  relationship  of  every  person  to  every  other 
person  in  terms  of  common  characteristics  and  the  results  are  assembled  in  a 
matrix.  The  matrix  reports  an  index  of  similarity  of  every  person  to  every 
other  person. 
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In  accordance  with  the  definition  of  typ^s,  the  matrix  is  searched  for 
i nterna I ly-cons i stent  submatriccs.  A  submotrix  of  two  persons  is  internally 
consistent  if  Individual  j_  is  most  !''<c  Individual  j  and  j  is  in  turn  most 
like  _i_.  Internal  ly-cons  istent  submat  ices  of  higheT  orde?  are  defined  analogously. 

Internal ly-consi stent  submatrices  of  any  size  can  be  isolated  by  the  methods 
of  Reciprocal  Pairs  or  Rank  Order  Typal  Analysis,  as  described  elsewhere  (McQuitfy, 
196**  and  1966a). 

Each  i ntcrnal  ly-cons i stent  submatrix  def  ines  a  hierarchical  type.  Each 
hierarchical  type  has  the  characteristics  which  arc  cofrimor  to  its  members.  Each 
hierarchical  type  is  assumed  to  be  a  better  representative  of  a  pure  type  than  is 
any  one  of  the  imperfect  types  with  which  it  is  compared. 

The  imperfect  types  of  the  internal  ly-cons i stent  submatrix  are  replaced 
in  the  original  matrix  by  the  hierarchical  type,  and  the  analysis  proceeds  in  this 
fashion  until  all  persons  are  classified  into  one  of  two  major  hierarchical  types 

as  shown  in  Figure  I.  _ 

Insert  Figure  I  about  here 

An  1 1  lustration 


An  example,  using  Hierarchical  Analysis  by  Reciprocal  Pairs,  will  help  clarify 
the  general  approach.  The  method  was  applied  to  a  matrix  of  agreement  scores  between 
industrial  companies  which  had  been  analyzed  many  times  in  terms  of  other  pattern 
analytic  methods.  The  agreement  scores  for  these  companies  are  shown  in  Table  1. 


Insert  Table  1  about  here 

Variables  A  and  B  represent  two  construction  companies,  C  and  0  trucking  companies, 

E  and  F  grain  processing  and  metal  products  respectively,  and  G  and  H  garment 
companies  with  female  employees  only;  the  other  six  companies  employed  male  employees 
only.  The  companies  were  assessed  in  terms  of  32  variables.  Each  variable  was 
dichotomized  at  the  median  and  two  companies  agreed  on  a  variable  if  they  were  both 
either  above  or  below  the  median,  but  not  if  one  was  above  and  the  other  below  the 
median.  The  agreement  score  (Zubin,  1938)  is  the  number  of  items  on  which  the  two 
companies  agree. 

The  reciprocal  pairs  of  Table  I  arc  underlined.  They  are  for  Pairs  AB,  CO, 

EF,  and  GH.  Company  A,  for  example,  has  Company  B  most  like  it,  and  Company  B  in 
turn  has  Company  A  most  like  it,  thus  fulfilling  the  requirements  of  reciprocity 
as  used  here. 

Companies  A  and  B  have  in  common  29  of  the  32  characteristics  on  which  they 
were  assessed.  These  two  companies  are  collapsed  into  a  single  hierarchical  type  AB 
and  are  characterized  by  their  29  common  characteristic*.  In  a  similar  fashion 
memoers  of  the  other  three  reciprocal  pairs  arc  collapsed  into  three  Hierarchical  Types, 
CD,  EF,  and  GH,  described  by  26,  21,  and  24,  common  characteristics  respectively. 

The  agreement  score  of  every  hierarchical  type  with  every  other  hierarchical 
type  is  computed  and  the  results  are  reported  in  Table  2.  Table  2  is  analyzed  in 


Insert  Table  2  about  here 


the  same  fashion  as  Table  1.  Results  of  the  analysis  of  the  two  tables  are  shown 
in  Figure  1 . 
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McQuitty  03 

Puri fy  i  nq  the  Oat  a 

Hierarchical  Classification  by  Reciprocal  Pairs  attempts  to  purify  the  data  in 
relation  to  types  as  the  analysis  proceeds  (McQuitty,  1966a).  Lower  level  types  are 
assumed  to  be  more  imperfect  than  higher  level  types.  Consequently,  when  any  two 
imperfect  types  such  as  E  and  F  of  Figure  I  are  combined  into  a  single  higher  type, 

EF,  this  latter  type  is  assumed  to  possess  only  the  characteristics  which  the  two 
imperfect  types,  E  and  F,  have  in  common. 

The  above  assumption  was  applied  to  the  agreement  scores  of  Table  1  and  produced 
the  classification  reported  in  Figure  I. 

That  Hierarchical  Classification  by  Reciprocal  Pairs  does  in  fact  sometimes 
purify  the  data  can  be  illustrated  by  comparing  the  results  from  it  with  analogous 
results  from  Rank  Order  Typal  Analysis. 

Rank  Order  Typal  Analysis  makes  no  effort  to  purify  data  as  it  proceeds. 

The  first  step  in  Rank  Order  Typal  Analysis  is  lo  convert  the  data  of  Table  I, 
for  example,  into  ranks  within  columns,  where  the  highest  rank  of  each  column  is 
assumed  to  be  the  entry  r..  for  the  reliability  of  each  person  with  himself. 

The  ranks  within  columns  of  the  data  of  Table  I  arc  shown  in  Table  3.  This 

Insert  Table  3  about  here 


latter  table  shows  in  Column  C,  for  example,  that  C  is  assumed  to  be  most  I i ke  C  and 
is  therefore  assigned  a  rank  of  I.  The  other  ranks  are  assigned  in  terms  of  the 
relative  size  of  the  agreement  scores  in  Table  I.  Company  D  has  the  largest  agree¬ 
ment  score  with  C  (except  for  C  with  itself)  and  is  therefore  assigned  a  rank  of  2. 
Other  ranks  arc  assigned  in  an  analogous  fashion. 

A  Rank  Order  Analysis  of  Table  3,  os  reported  earlier  (McQuitty,  1963)  produces 

the  results  shown  in  Figure  2_. _ 

Insert  Figure  2  about  here 


‘nation  of  Table  3  shows  that  Company  E  is  most  like  F  and  in  turn  has 
F  most  i  ;.  The  two  companies  form  a  type  as  shown  in  Figure  2.  The  analogous 
result  is  for  Companies  G  and  H. 

The  classifications  differ,  however,  from  those  obtained  with  Hierarchical 
Classification  by  Reciprocal  Pairs.  Companies  E,  F,  G,  and  H  do  not  form  a  type 
in  Rank  Order  Typal  Analysis  but  they  do  form  a  type  in  Hierarchical  Classification 
by  Rcciproca I  Pai rs . 

The  difference  in  the  two  approaches  is  emphasized  by  comparing  Table  2  with 
Table  3- 

Using  Rank  Order  Typal  Analysis,  Table  3  shows  that  Companies  E,  F,  G,  and  H 
do  not  form  an  i ntcrnal ly-cons i stent  category  in  being  like  one  another.  If  they 
did,  there  would  be  no  rank  larger  than  four,  the  number  of  cases  in  the  submatrix 
EFGH  by  EFGH . 

On  the  other  hand,  when  the  four  companies  E,  F,  G,  and  H  arc  first  classified 
into  two  hierarchical  types,  EF  and  GH ,  and  arc  thereby  purified  in  the  method  of 


Hierarchical  Classification  by  Reciprocal  Pairs,  they  then  yield  an  intcrnally- 
consistcnt  pair  of  hierarchical  types  as  shown  in  Table  2;  this  justifies  their 
classification  into  a  hierarchical  type,  EFGH. 

In  this  example,  Hierarchical  Classification  purified  the  data  as  it  proceeded 
in  the  analysis. 


Some  Limitations  of  Hierarchical 
Classification  by  Reciprocal  Pairs 

Although  Hierarchical  Classification  has  many  advantages  as  outlined  elsewhere 
(McQuitty,  1964,  1965.  1966a,  1966b,  1967)  it  has  certain  limitations. 

The  initial  classification  begins  at  the  bottom  of  the  hierarchical  system 
and  depends  primarily  on  only  a  few  of  all  of  the  indices  of  association  in  a  matrix. 
Mistakes  might  occur  early  in  the  analysis  as  a  result  of  using  only  a  few  relatively 
unreliable  indices  and  might  have  serious  consequences  for  the  subsequent  classifica¬ 
tions  . 


Two  Approaches  toward  a  Solution 

There  are  two  possible  attacks  on  these  problems.  One  approach  is  to  attempt 
to  increase  the  reliability  and  validity  of  the  few  indices  on  which  the  classifica¬ 
tion  decisions  depend. 

Another  attack  on  the  problem  is  to  attempt  to  develop  a  method  which  starts 
at  the  top  of  the  hierarchical  system,  uses  all  indices, and  builds  downward.  Such 
an  approach  might  divide  the  original  matrix  into  two  submatrices  and  then  continue 
by  dividing  the  successive  submatriccs  until  a  structure  such  as  represented  in 
Figure  1  is  built  from  the  top  down. 

A  Joint  Solution 


General  Description 

In  attempting  first  to  solve  only  the  first  problem,  viz.,  to  increase  the 
validity  of  the  few  indices  on  which  the  classifications  depend,  I  discovered  an 
approach  which  solves  both  this  problem  and  the  one  of  using  all  indices  in  each 
decision.  The  new  method  divides  the  large  matrix  into  submatriccs  and  then  divides 
successively  each  submatrix  using  in  each  case  all  of  the  indices  of  the  matrix  or 
submatrix  on  which  the  operations  are  performed.  Each  time  before  making  a  division, 
the  method  takes  steps  designed  to  increase  the  reliability  and  validity  of  all  indices. 

Detailed  Description 

The  method  is  now  described  in  more  detail  in  the  order  in  which  it  was 
developed . 


**  •  *  »  « v  i  mm 


i 


| 


McQuitty  4.05 

Increasing  the  Validity  of  Indices.  In  attempting  to  increase  the 
reliability  and  validity  of  the  _r*j  in  an  N  by  M  matrix  of  indices  between 
people,  the  correlation  was  computed  between  corresponding  entries  of  the 
columns  j_  and  j.  This  approach  gave  an  index  of  the  extent  to  which  i  and 
j  varied  jointTy  in  relation  to  the  other  N-2  variables  of  the  matrixT  In 
other  words,  the  relationship  between  i  and  j  is  estimated  by  computing  the 
extent  to  which  they  are  jointly  like  "R-2  oTRer  variables.  The  new  index 
is  called  an  intcrcolumnar  correlation,  and  is  designated  I. 

In  testing  the  validity  of  the  intcrcolumnar  correlations,  as  compared 
with  agreement  scores,  intcrcolumnar  scores  were  computed  for  the  agreement 
scores  of  Tab! e  1 . 

The  Pearsonian  Coefficient  was  used  in  computing  the  intcrcolumnar 
correlations  between  columns  of  agreement  scores. 

In  testing  the  validity  of  the  intcrcolumnar  correlations,  they  were 
pattern  analyzed  and  compared  with  previous  pattern  analyses  of  agreement 
scores.  Rank  Order  Typal  Analysis  of  intcrcolumnar  coefficients  gave  the 
same  results  as  Hierarchical  Analysis  by  Reciprocal  Pairs. 

The  computation  of  intcrcolumnar  correlations  from  agreement  scores 
purified  the  data  in  a  fashion  somewhat  similar  to  Hierarchical  Classifica¬ 
tion  by  Reciprocal  Pairs;  the  two  approaches  produced  identical  classifica¬ 
tions. 

Based  on  both  other  pattern  analytic  analyses  of  the  data  and  known 
characteristics  of  the  companies,  the  Rank  Order  Typal  Analysis  of  the 
intcrcolumnar  coefficients  produced  a  more  valid  picture  of  the  structure 
than  did  the  Rank  Order  Typal  Analysis  of  the  agreement  scores. 

Seeking  the  Nature  of  the  Improvement.  To  seek  the  nature  of  the  error 
being  corrected  by  (a) intcrcolumnar  correlations  and  (b)  Hierarchical 
Classification  by  Reciprocal  Pairs  seemed  worthwhile. 


‘i  - 


McQu i tty 


4.06 


As  a  first  step  in  this  direction,  Pearsonian  Coefficients  between  the 
several  companies  were  computed  on  the  basis  of  the  original  32  scales  for 
the  eight  companies,  to  yield  the  matrix  shown  in  Table  4. 


Insert  Table  4  about  here 


This  table  was  converted  to  ranks  within  columns  and  produced  the  same 
results  exactly  as  did  the  ranks  within  columns  of  the  intercolumnar  correlation 
of  agreement  scores.  The  ranks  for  both  approaches  are  shown  in  Table  5- 


Insert  Table  3  about  here 


An  inspection  of  this  table  shows  that  a  Rank  Order  Typal  Analysis  of 
it  will  yield  the  same  classification  as  obtained  by  (a)  Hierarchical 
Classification  by  Reciprocal  Pairs  when  applied  to  the  agreement  scores  and 
(b)  Rank  Order  Typal  Analysis  of  Intercolumnar  Correlations  of  Agreement 
Scores  (Figure  I). 

In  summary,  both  (a)  the  computation  of  intercolumnar  correlations, 
and  (b)  Hierarchical  Classification  by  Reciprocal  Pairs  corrected  errors 
introduced  when  the  data  were  dichotomized  and  agreement  scores  computed 
to  represent  the  data. 

The  above  results  support  the  hypothesis  that  intercolumnar  correlations 
of  agreement  scores  between  people  are  more  valid  for  the  isolation  of  types 
than  are  the  agreement  scores  themselves. 

A  Statistical  Method  Generated  by  a  Hypothesis 
The  Hypothesis 

The  above  hypothesis  was  used  to  generate  another  hypothesis.  If  the 
first  intercolumnar  correlations  of  original  indices  of  a  matrix  enhance  the 
emergence  of  types  then  possibly  the  next  and  subsequent  computations  of 
intercolumnar  correlations  would  facilitate  still  farther  the  appearance 
of  types  if  they  are  present  but  hidden  in  the  original  indices.  It  is 
therefore  hypothesized  for  further  study  that  iteration  of  intercolumnar 
correlations  generates  the  emergence  of  types  in  a  matrix  of  interassociations 
between  people  if  types  are  present  but  hidden  in  the  original  matrix. 

Testing  the  Hypothesis 

First  Test.  The  same  set  of  data  was  used  in  testing  this  hypothesis.  The 
standing  of  the  eight  companies  on  the  32  scales  was  used  in  lieu  of  dichotomized 
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data.  The  Pcarsonian  Coefficient  of  correlation  for  every  company  with  every  ether 
company  was  computed.  The  results  are  reported  in  Table  4. 

The  i ntcrcolumnar  Pcarsonian  Coefficient  of  correlation  was  then  computed  for 
every  column  with  every  other  column  of  Table  4,  to  yield  the  first  intcrcolumnar 
matrix.  The  process  was  repeated  on  the  first  and  subsequent  intcrcolumnar  matrices 
until  in  the  fifth  matrix  all  entries  became  cither  plus  one  or  minus  one  as  shown 

in  Table  6.  _ 

Insert  Table  6  about  here 


Table  6  reflects  two  types,  ABCD  and  EFGH,  each  defined  by  a  submatrix  in 
which  all  entries  are  plus  one. 

The  above  procedure  was  applied  to  variables  of  each  submatrix,  using  in  each 
case  the  original  entries  of  correlation  reported  in  Table  4.  Again  each  submatrix 
was  divided  into  two  smaller  submatriccs  of  plus  one  entries.  The  process  isolated 
Types  AB  and  CD  in  the  third  intcrcolumnar  table  of  Submatrix  ABCD  and  Types  EF  and 
GH  in  the  fourth  intcrcolumnar  table  of  Submatrix  EFGH. 

The  original  correlations  and  the  three  submatriccs  of  intcrcolumnar  correla¬ 
tions  for  Variables  A,  B,  C,  and  D  are  shown  in  Table  7. 


Insert  Table  7  about  here 

The  above  Iterative  Intcrcolumnar  Correlational  Analysis  produced  the  same 
types  as  did  both  (a)  the  Hierarchical  Classification  by  Reciprocal  Pairs  and  (b) 
the  Rank  Order  Typal  Analysis  of  the  first  intcrcolumnar  correlations  from  agree¬ 
ment  scores.  The  results  for  the  three  analyses  arc  shown  in  Figure  1. 


Second  Test.  As  a  more  crucial  test  of  the  ability  of  Iterative  Intcrcolumnar 
Correlational  Analysis  to  yield  types  if  operative  in  the  data,  the  method  was 
applied  to  a  set  of  data  which  earlier  proved  relatively  resistant  to  pattern- 
analytic  methods.  The  data  are  particularly  difficult  to  pattern-analyze  because 
they  include  many  tics  in  crucial  agreement  scores. 

"The  data  were  generated  by  requesting  a  subject  to  react  to  the  pictures  of 
20  art  objects,  by  using  adjectives  which  might  describe  them  (40  adjectives  were 
used).  For  each  art  object  the  subject  went  through  the  entire  list  of  adjectives 
before  proceeding  to  the  next  object.  The  subject  responded  by  saying,  in  effect, 
that  the  adjective  is  descriptive  of  the  object;  that  it  is  not  descriptive;  or  that 
she  could  not  decide  whether  or  not  it  is  descriptive.  If  the  subject's  initial 
response  to  a  picture  was  positive,  she  then  endorsed  one  of  three  alternative 
answers.  (1)  'I  like  the  characteristic  described  by  this  adjective,'  (2)  'I  do 
not  like  it,'  (3)  'I  can't  decide  whether  or  not  I  like  it.'  ... 


"An  agreement  score  (Zubin,  1938)  was  computed  for  every  object  with  every 
other  object.  Suppose  that  there  were  six  adjectives  and  two  objects  and  that  the 
subject  reported  the  following  reactions: 


Adject  ives 


2 


3 


4  5 


6 


Object  A  Yes,  Like 

Object  B  Yes,  Like 


Yes ,  D i s I i kc 
? 


No  No 

Yes,  Like  No 


Yes ,Di s I i ke 
Yes ,0i s 1 i ke 


Yes,  ? 
Yes,  ? 
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The  agreement  score  between  A  and  0  for  these  six  adjectives  would  be  four,  the 
agreement  being  on  I  terns  I,  3,  5.  and  6  only. 

"Similar  computations  across  all  hO  adjectives  and  among  all  20  objects 
yielded  the  20  x  20  matrix  of  agreement  scores  shown  in"  Table  8  (McQuitty,  Price, 

and  Clark,  1967).  _ 

Insert  Table  8  about  here 

Table  9  reports  the  first  matrix  of  i ntcrco I umnar  correlations  and  Table  10 
reports  the  fifth  matrix  of  i ntercolumnar  correlations,  viz.,  the  first  table  in 
which  all  entries  were  cither  plus  or  minus  one,  to  yield  Types  CFIMPANQEKS  and 

JTRGBOHDL.  _ 

Insert  Tables  9  and  10  about  here 


Figure  3  shown  the  results  from  the  complete  analysis  of  the  data  by  Iterative 
Intercolumnar  Correlational  Analysis.  These  results  show  that  the  current  method  can 


Insert  Figure  3  about  here 


analyze  with  ease  one  set  of  data  which  has  proven  difficult  for  most  methods  because 
the  data  involves  tied  values  crucial  for  several  required  decisions. 

General  Evaluation.  Every  c lass i f i catory  decision  in  the  iterative  method  is 
based  on  all  of  the  indices  of  the  matrix  being  analyzed  as  compared  with  primarily 
only  one  index  in  the  reciprocal  pairs  method.  It  is.  therefore,  hypothesized  to  be 
both  more  reliable  and  valid  than  the  latter  method.  These  points  need  further 
study. 

A  Mathematical  Proof 


There  is  a  more  sophisticated  approach  to  substantiating  the  hypothesis  that 
Iterative  Intercolumnar  Correlational  Analysis  will  isolate  types  if  operative  in 
the  data,  viz . ,  to  prove  it  mathematically. 

The  Generation  of  Plus  One  Intercolumnar  Correlations.  In  the  development  of 
the  proof,  a  type  is  defined  as  a  category  of  people  of  such  a  nature  that  everyone 
in  the  category  has  a  group  of  common  characteristics,  and  anyone  not  in  the  category 
docs  not  possess  all  of  these  characteristics. 

Assume  now  that  we  have  a  matrix  of  interassociations  between  people  based  on 
test  item  responses  which  assess  typal  membership  with  validity  better  than  chance, 
Assume  also  that  any  variance  in  the  responses  to  the  test  which  is  not  attributable 
to  typal  membership  is  governed  by  chance  alone. 

Let : 

1)  i  and  j  be  any  two  individuals  of  the  same  type. 

2)  x | »  x2 »  x3 - x|]  k°  any  N  individuals  with  no  one  of  them  specified  in 

any  way  as  to  typal  membership. 

The  coefficients  of  correlation  between  these  variables  arc  indicated  in 
Table  11. 


Insert  Table  II  about  here 
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Let  . 

H  be  infinitely  large,  so  large  thaL  chance  variation  is  ignored. 

3) 


r  .  =  r 

x .  I 


r  .  =  r 
x  ^  i 


;2J 


r  .  =  r 
j 


x3J 


=  r 


II 


V 


I  -■?!! 


h )  Let  I..  =  the  i n tc rco I umnar  correlation  between  i  and  j,  i.e.,  the 

'•*  correlation  between  corresponding  entries  of  columns  i  and  j 

of  Tab  I e  II. 


If  the  entries  for  Columns  i  and  j  of  Table  II  were  known,  the  i ntc rco I umnar 
coefficient  could  be  computed  by  substituting  the  entries  of  Columns  i  and  j  in  a 
regular  formula  for  computing  the  Pcarsonian  r. 


Analogously,  the  symbols  of  Columns  i  and  j  can  be  substituted  in  a  raw  score 
formula  for  computing  r.  This  new  formula  is  then  the  i ntercolumnar  coefficient 

l!.'N  .  This  new  formula  can  be  simplified  by  substituting  either  the  r  .  .  for  the 

i  j  x  s  i 

corresponding  r^,t  ^  or  the  r^ ,  ^  for  the  corresponding  r^,c  .  (from  Equation  3). 

5)  In  the  first  ease  I  !  .  =1,  except  when  r  .  =  r  .  =  r  . - =  r 

.j  v  V  *3<  v 

and  in  the  latter  case  except  for  equality  among  all  r  , 

X  S  J 


The  above  conditions  would  occur  if  and  only  if  either  all  x's  belong  to  the 
same  types  or  all  x's  had  nothing  in  common  with  either  i  or  j .  The  proof  is  developed 
in  detail  elsewhere  (McQuitty,  **  ). 


The  proof  means  that  Iterative  Intcrcolumnar  Correlational  Analysis  can  isolate 
the  types  reflected  in  a  matrix  of  i ntc rassoc i a t i on  between  people  provided  the 
assumptions  out  of  which  the  proof  developed  arc  satisfied.  Whether  or  not  they  are 
satisfied  by  the  data  is  indicated  by  applying  the  method  to  the  data.  If  the  types 
arc  isolated,  a  search  for  the  common  characteristics  of  the  members  of  each  type 
will  determi no  whe the r  or  not  the  assumptions  have  been  satisfied.  A  cross  valida- 
tional  study  is  required  to  investigate  the  stability  of  the  types. 

The  Generation  of  llinus  One  I  ntercol  un.nor  Correlations.  Another  proof  must  be 
added  to  the  above  developments  if  the  isolated  types  arc  to  be  easily  recognized. 

The  additional  proof  must  show  that  iteration  of  indices  for  variables  not  in  the 
same  type  will  not  move  them  to  plus  one  as  a  limit. 


Let  : 


Individuals  w  and  v  be  any  two  individuals  of  "opposite1  types.  Two  types 
arc  "opposite"  if  they  have  no  common  characteristics;  each  type  has  its  own 
characteristics  and  lacks  all  of  the  characteristics  of  the  othci  type.  In  data 
where  absent  characteristics  arc  relatively  meaningless  from  the  point  of  view  of 
the  theory  being  applied,  a  more  appropriate  term  would  be  independent  rather  than 
"opposite"  types. 


In  this  ease : 

6 )  r  =  -  r  ;  r  =  -  r  ;  r  =  -  r 

X|U  XjV  X^u  X^v  x^u  x^v 


xu  =  -r 
U  xMv 


»  • 


1 
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As  before,  by  substituting  in  a  raw  score  formula  for  the  computation  of  the 
Pearsonian  Coefficient  and  simplifying,  the  result  shot/?  that  the  i ntercolumnar 
correlation  for  two  individuals  of  "opposite"  types  (over  II  other  individuals  not 
all  of  the  same  type)  equals  minus  one. 

The  above  proofs  show  that  two  individuals  of  the  same  type  will  yield  a  plus 
one  and  two  individuals  of  "opposite"  types  will  yield  a  minus  one  i ntc rcolumnar 
correlation  when  computed  over  any  fl  individuals  of  more  than  one  type. 

The  Generation  of  I  ntcrco  I  umnar  r's  (  +1  -  I .  Any  two  individuals,  not  of 

the  same  type  and  not  of  "opposite"  types  will  yield  an  i ntc rco I umnar  correlation 
of  less  than  plus  one  and  greater  than  minus  one  when  computed  over  any  M  individuals. 
This  is  because  data  of  this  kind  cannot  satisfy  either  Equation  3  or  6;  the  first  of 
these  equations  must  be  satisfied  if  the  intcrcolumnar  correlation  is  to  be  plus  one 
and  the  second  equation  must  be  satisfied  if  it  is  to  be  minus  one. 

The  above  developments  show  that  the  intcrcolumnar  correlation  between  any  tv/o 
individuals  is  less  than  plus  one  and  greater  than  minus  one,  if  computed  over  other 
individuals  all  of  the  same  Lype,  and  also  when  ccmputed  over  individuals  of  different 
types,  except  when  the  two  individuals  arc  either  of  the  same  or  opposite  types. 

When  the  two  individuals  are  of  the  same  type,  the  intcrcolumnar  correlation  is  plus 
one,  and  it  is  minus  one  when  they  arc  of  opposite  types. 

The  Reverse  Proofs 


The  reference  article  (HcQuitty,  ••••'•)  shows  also  that  the  proofs  can  be 
reversed  to  show: 

1.  If  the  intcrcolumnar  correlation  between  two  individuals  i  and  j  is  plus 
one,  then  they  belong  to  the  same  type. 

2.  If  the  intcrcolumnar  correlation  between  two  individuals  x  and  ^  is  minus 
one,  then  they  belong  to  "opposite"  types,  where  "oppos i tc" i s  defined  to 
mean  that  each  type  has  characteristics  of  its  own  and  each  lacks  all  the 
characteristics  possessed  by  the  other;  there  is  no  overlap  of  the  typal 
character i s  t i cs . 

3.  If  the  intcrcolumnar  correlation  between  two  individuals,  m  and  ri  is  less 
than  plus  one  and  greater  than  minus  one,  then  m  and  _n  have  not  been 
proven  to  be  members  of  either  a  single  type  or  of  "opposite"  types. 

The  above  developments  show  that  iterative  intcrcolumnar  analysis  can  be  used 
to  isolate  types,  as  aircady  illustrated  in  this  .paper  with  real  data. 

Further  elaborations  of  Intcrcolumnar  Correlational  Analysis  arc  reported 
elsewhere  (McQuitty,  ***,  and****  ). 

Suggested  Advantages  of  the  Method 

Suggested  advantages  of  the  method  arc:  (1)  it  is  rapid,  simple  to  program 
for  a  computer,  and  can  be  applied  to  large  sets  of  data  when  electronic  computers 
are  used;  (2)  the  method  uses  all  available,  pertinent  data;  (3)  the  analysis 
proceeds  by  first  dividing  a  matrix  of  associations  between  people  (or  other 
objects)  into  major  submatriccs  and  then  redividing  these  and  subsequent  submatrices 
into  smaller  and  smaller  submatrices  until  all  types  arc  defined  by  submatrices; 
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(4)  the  method  implicitly  hypothesizes  internally  consistent  types  in 
either  substantiates  or  fails  to  substantiate  the  hypothesis;  (5)  the 
required  to  be  internally  consistent  within  only  broad  chance  limits; 
yields  a  simple  structure  (if  the  hypothesis  of  internally  consistent 
substantiated)  where  simple  structure  is  defined  to  mean  correlations 
between  all  members  of  every  type,  and  less  than  plus  one  down  to  and 
one  between  types. 


data  and 
raw  data  is 
(6)  the  method 
types  i  s 
of  plus  one 
including  mi nus 


Limitations  of  the  Method 


Even  Iterative  Intcrcolumnar  Corre lat ional  Analysis,  with  all  of  its  suggested 
advantages,  docs  not  solve  all  of  the  problems  in  L he  isolation  of  types. 

One  particularly  difficult  problem  is  the  fact  that  indices  of  association 
between  people  vary  with  the  test  items  used  in  assessing  them.  Consequently,  the 
types  into  which  people  classify  vary  with  the  test  items  used  in  assessing  the 
people . 


The  Problem  of  t he  Single  Response  by  the  Single  Subject 

In  an  effort  to  solve  this  problem,  I  have  addressed  myself  first  to  a 
simpler  and  more  fundamental  problem,  the  problem  of  interpreting  a  single  response 
by  a  single  subject. 

"One  of  the  problems  of  interpreting  a  response  to  an  item  of  a  test  is  that  it 
can  be  assigned  various  meanings  depending  on  both  who  gives  it  and  the  other 
responses  (to  other  items)  with  which  it  occurs. 

"A  single  response  with  variable  meanings  can  be  found  to  have  stability  in 
psychological  space  if  it  can  be  assigned  to  a  combination  of  responses  which  has 
stability.  The  response  can,  however,  still  have  a  kind  of  var i ab I i I i ty ,  for  it 
might  be  assigned  to  several  combinations  of  responses,  and  each  of  them  might 
have  s  tab i I i ty . 

"In  other  words,  I  attempt  to  account  for  the  variability  of  meaning  of  a 
response  by  assigning  it  to  several  combinations  of  responses,  each  of  which  has 
stability  in  theoretical  psychological  space. 

"The  tern  psycho  log i ca  I  space  is  used  to  emphasize  the  possibility  that 
identical  responses (objectively)  might  prove  to  have  various  psychological  meanings. 

Inter-  and  I nt ra- Ind i v  i  dua I  Differential  Validity 

"Elsewhere,  I  have  used  the  term  differential  validity  to  refer  to  the 
possibility  that  a  response  might  assess  different  attributes  in  different  persons 
(McQuitty,  1959). 

"Differential  validity  is  involved  (as  illustrated  in  Figure  4)  when  a  given 
Response  _i_  is  endorsed  along  with  Responses  j,  k,  and  I  by  one  category  of  subjects, 

Insert  Figure  about  here 


A,  to  indicate  Type  X  and  the  same  objective  Response,  i,  is  endorsed  along  with 
Responses  r,  s,  and  _[  by  another  category  of  subjects,  B,  to  indicate  Type  Y.  In 
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the  first  case,  endorsement  of  Response  i  indicates  Type  X  and  in  the  latter  case. 
Response  _[  indicates  Type  Y. 

"The  present  paper  refines  further  the  concept  of  differential  validity  by 
introducing  two  forms  of  it,  inter-  and  i nt ra- i nd i v  i  duo  I . 

"Inter-differential  validity  is  now  used  to  mean  what  we  intended  originally 
by  differential  validity,  as  summarized  above. 

"We  recognize  now  the  possibility  that  a  response  may  be  applied  in  a  typological 
theory  to  assist  in  assessing  various  attributes  in  the  same  individual,  depending 
upon  the  other  combination  of  responses  with  which  it  is  interpreted. 

"In  introducing  intra-differential  validity,  let  us  suppose  (as  illustrated  in 
Figure  4)  that  a  third  category  of  subjects,  C,  is  formed  by  combining  the  members  of 
each  Categories  A  and  B;  they  arc  portrayed  by  the  common  Responses  \_  and  2."  and 
they  indicate  Type  Z. 

"A  type  (such  as  X,  Y,  or  Z)  is  defined  by  all  of  the  common  ways  in  which  the 
members  of  the  type  behave.  Consequently,  each  Type  X,  Y,  and  Z,  would  differ  from 
each  of  the  other  two  types. 

"From  a  typological  point  of  view  (which  classifies  people  in  terms  of  combina¬ 
tions  of  responses),  Response  2  with  Responses  j,  k,  and  I,  indie;  es  Type  X;  i  with 
r_,  ,  and  2  indicates  Type  Y,  and  2  with  2  only  indicates  Type  Z.  Response  2  would 
have  various  meanings  within  the  same  individual  depending  on  the  combination  of  other 
responses  with  which  it  is  interpreted.  This  is  what  we  mean  by  intra-differential 
validity,  a  single  response  assessing  various  attributes  in  the  same  individual 
depending  on  the  other  responses  with  which  it  is  interpreted." 

The  Problem  of  a  Set  of  Responses  by  a  Single  Individual 

"A  set  of  responses  by  an  individual  to  the  items  of  a  test  invites  scientific 
explanation  and  understanding  in  the  same  fashion  as  does  the  single  response  to  a 
single  item.  A  set  of  responses  has  additional  attractive  characteristics;  (I)  the 
set  can  be  used  to  assist  in  assigning  meaning  to  individual  responses,  and  (2)  the 
set  of  responses  can  possibly  be  assigned  to  sub-sets  which  have  relatively  stable 
meanings . 

"The  problem  is  to  devise  methods  f  o '  isolating  all  of  the  major  and  meaningful 
sub-sets  in  which  the  responses  of  an  individual  to  a  theoretically  meaningful  test 
can  be  assigned. 

"In  summary,  a  response  may  possibly  have  different  assessment  indicants  for 
each  major  and  meaningful  combination  of  responses  to  which  it  can  be  assigned. 

Two  Solutions 


"Two  kinds  of  pattern  analyses  (or  factor  analyses)  of  the  responses  by  a  single 
individual  can  be  recognized:  (a)  individual  based,  and  (b)  group  based. 

Individual  Based 


aai 


m 


t 


i 


"In  the  first  instance,  the  investigator  gathers  data  in  such  a  fashion  that 
he  can  compute  an  index  of  the  interrelation  of  every  response  by  a  subject  to  every 


other  response  by  that  subject,  without  the  use  of  a  reference  group. 

"The  following  test  items  from  a  study  now  in  progress  fulfill  the  above 
requirement  and  illustrate  the  kind  of  data  required  for  one  kind  of  a  statistical 
analysis  of  a  single  individual,  viz.,  an  individually  based  approach: 
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is  possible  to  compute  an  index  of  the  extent  to  which  an  individual  responded  to 
anqe I  in  the  same  way  as  he  did  to  dev i  I  . 

"Using  many  words  (in  the  same  fashion  as  illustrated  above  for  ange I  and  dev i  I ) 
one  can  compute  a  matrix  of  interassociations  between  selected  concepts  over  selected 
emotions.  The  matrix  can  be  pattern  analyzed  (or  factor  analyzed)  by  any  one  of  the 
many  available  methods.  Examples  of  the  above  approach  arc  found  in  studies  by 
Schubert  (1965)  and  McQuitty,  Price,  and  Clark  (1967)."  (McQuitty,  *  ). 

The  above  described  test  and  its  method  of  analysis  both  grew  out  of  a  theory 
of  the  nature  of  both  mental  illness  and  mental  health.  The  approach  is  described 
elsewhere  (McQuitty,  Abeles,  and  Clark,  study  in  progress). 

Group  Based 


"Assumpt i  ons .  A  series  of  assumptions  suggests  and  justifies  a  solution  to 
the  problem  of  isolating  the  major  patterns  of  responses  of  a  single  individual  to  the 
items  of  a  test,  as  these  are  reflected  in  the  responses  of  a  group  of  subjects. 

"We  assume  that  every  individual  is  an  imperfect  representative  of  one  or  more 
pure  types.  If  two  or  more  imperfect  representatives  of  the  same  pure  type  are 
considered  jointly,  they  give  a  better  picture  of  the  pure  type  than  any  one  of 
them  separately. 

"If  an  individual  is  representative  of  n  pure  types,  then  in  order  to  give  a 
comprehensive,  typological  picture  of  the  individual,  he  must  be  treated  jointly  with 
at  least  one  other  representative  of  each  of  these  pure  types. 

"If  a  set  of  responses  by  an  individual  is  to  be  understood  from  a  group-based 
typological  point  of  view,  then  we  require  a  classification  of  the  individual  with 
one  or  more  members  of  each  type  represented  in  the  set  of  responses  by  the  single 
individual.  The  classification  is  more  helpful  if  it  specifies  the  responses  which 
classify  the  individual  into  each  of  the  types  he  represents. 

"Method .  The  goals  implied  above  can  be  easily  realized  by  any  one  of  many 
pattern-analytic  methods  (McQuitty,  1967),  provided  only  that  a  simple  operation 
be  introduced  at  the  beginning  of  the  analysis. 

"Suppose  that  we  wish  to  study  the  pattern  of  responses  of  Individual  A  to  the 
items  of  Test  X.  One  approach  is  to  administer  the  test  to  100  other  individuals, 
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representing  a  universe  v/hich  is  meaningful  in  an  effort  to  understand  Individual  A. 

"The  novel  operation  required  by  the  approach  of  this  paper  is  as  follows: 

Pair  the  pattern  of  responses  of  Individual  A  with  those  of  each  of  the  100  other 
individuals  to  yield  Pairs:  A I ,  A2 ,  A3  —  A100.  Specify  new  patterns  for  each 
of  the  100  pairs,  by  taking  the  common  responses  of  each  pair.  For  example,  if  the 
responses  by  Individual  A  and  Individual  20  were  as  shown  in  Table  12*,  then  Pattern 
A-20  (for  Individuals  A  and  20  treated  jointly)  would  be  +  on  I  terns  I ,  -  on  2 ,  -  on  5, 
+  on  6,  +  on  7,  and  -  on  9,  with  I  terns  3,  ,  and  8  omitted  because  Individuals  A  and 
20  disagree  on  each  of  these  latter  three  items. 


Insert  Table  12  about  here 


Illustration 


"In  order  to  illustrate  the  isolation  of  major  response  patterns  for  a  single 
individual,  we  have  chosen  to  analyze  the  course  selections  in  psychology  by  a  single 
individual  in  relation  to  the  course  selections  in  psychology  by  the  135  other  majors 
in  that  discipline,  who  graduated  at  Michigan  State  University  during  the  academic  years 
1961-62  and  1962-63. 

"During  his  four  years  of  college,  the  one  subject  chosen  for  analysis  (Code  #83) 
registered  in  and  obtained  grades  in"  (McQui tty,  *  )  17  psychology  courses  on  the 
quarter  system.  In  addition  to  the  above  courses,  the  135  other  students  majoring 
in  psychology  completed  and  received  grades  in  one  or  more  of  23  other  quarter- length 
courses . 

"The  purpose  is  to  classify  the  course  selections  by  Subject  A  into  their  major, 
meaningful  patterns,  using  the  course  selections  by  the  other  135  students  of  the 
study  as  the  source  of  information  which  enables  us  to  accomplish  the  task. 

"Individual  //83  is  first  paired  with  each  of  the  135  other  individuals  of  the 

study  to  yield  Pairs  83-1,  83-2,  83-3  -  83-136  (omitting  83-83).  Then  the  courses 

selected  jointly  by  the  members  ol  each  pair  are  determined  to  yield  patterns  83-1, 

83-2,  83-3  ---  83-136  (omitting  83-83). 

"An  agreement  score  is  computed  between  every  pattern  with  every  other  pattern. 

For  example,  if  Patterns  03-1  and  83-2  include  the  course  selections  shown  in 
Table  13,  then  their  agreement  score  for  these  five  courses  would  be  3,  the  number 


Insert  Table  13  about  here 


of  courses  which  occur  in  each  of  the  wo  patterns  (specifically  courses  2,  8  and  16). 

"Using  the  agreement  scores,  a  matrix  was  prepared  to  show  the  agreement  score 
of  every  pattern  with  every  other  pattern. 

"Single  Hierarchical  Analysis  by  Reciprocal  Pairs  was  applied  to  pattern  analyze 
the  Matrix  (McQui tty,  1966a).  Five  individuals  were  chosen  at  random  from  the  above 
group  of  136  subjects,  and  the  results  from  each  of  them  were  analyzed  separately 
by  the  above  methods. 
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Resu  I  to 


"Four  of  the  five  individuals  analyzed  yielded  four  and  only  four  clusters. 
The  other  individual  (Code  ft 83)  produced  five  clusters.  We  elected  to  describe 
results  from  this  individual  in  detail  because  he  shows  less  interrelation  with 
the  other  135  individuals  (in  terms  of  the  number  of  clusters):  if  his  results 
arc  meaningful,  then  those  of  the  others  are  likely  also  to  be  meaningful. 

"The  results  for  Individual  03  are  shown  in  Figures  5  -  9  and  Tables  14  -  18. 
Figure  5  portrays  Clusters  I  and  2.  Figures  6  and  7  report  Clusters  3  and  4 
respectively,  and  Figures  8  and  9  each  report  approximately  one-half  of  Cluster  5. 

Insert  Figures  5  ~  9  and  Tables  14  -  10  about  here 


"In  the  first  step  of  the  analysis  of  the  matrix,  Individual  6c  joined 
Individual  130  as  shown  in  Cluster  2,  Figure  5.  Individual  5  joined  Individual  39 
(Cluster  5,  Figure  8)  and  Individual  21  joined  Individual  125  (Cluster  5,  Figure  9). 
The  members  of  each  pair  agreed  in  having  selected  II  courses  in  common,  but  not 
common  from  pair  to  pair. 

"Since  only  course  selections  used  by  Individual  33  were  included  in  the 
analysis,  Individual  03  is  included  in  each  of  the  above  pairs  and  in  every  other 
combination  of  individuals  as  shown  by  the  intersection  of  lines  throughout  the 
f  i  gurcs . 

"Table  14,  for  example,  lists  certain  courses  (titles  and  code  numbers) 
involved  in  Cluster  I.  The  body  of  the  table  shows  courses  which  are  common  to  each 
major  intersection  point;  Courses  2,  5,  0,  and  28  were  selected  by  Subjects  35,  36, 
129,  and  48  to  yield  Intersection  Point  A,  as  shown  in  Figure  5,  and  Table  14. 
Intersection  points  involving  more  than  five  courses  (but  relatively  few  students) 
were  omitted.  Courses  arc  reported  in  the  tables  for  all  of  the  intersection  points 
which  arc  labeled  by  capital  letters  in  the  figures.  The  other  tables  are  interpreted 
in  an  analogous  fashion. 

"In  addition  to  intersection  points  reflecting  patterns  of  course  selection, 
whenever  a  series  of  points  is  joined  by  a  straight  line  parallel  to  the  base  line, 
all  subjects  of  the  points  thus  connected  selected  a  single  pattern  of  courses. 

"One  individual,  Code  //68,  failed  to  appear  in  the  analysis.  He  took  only  one 
course  at  Michigan  State  University  in  common  with  Individual  83.  He  transferred  to 
MSU  after  having  received  credit  in  psychology  courses  elsewhere;  the  records  do 
not  show  the  specific  MSU  courses  for  which  he  received  credit  upon  transfer.  He 
was  not  an  appropriate  member  of  a  universe  in  icrms  of  which  to  study  Individual  83. 
We  left  such  individuals  in  the  study  because  there  were  only  a  few  of  them  and  we 
wished  to  indicate  that  they  would  not  have  a  major  effect. 

I nterpretat i on 

"Clusters  2,  3,  and  4  appear  to  be  more  meaningful  than  Clusters  I  and  5- 
Cluster  2  portrays  a  central  interest  in  personality-clinical  as  related  to 
psychology  in  business.  Cluster  3  reflects  an  interest  in  individual  differences 
in  personality.  Cluster  4  seems  to  be  concerned  primarily  with  understanding  the 
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dynamics  ol  the  developing  individual.  Cluster  5  seems  to  be  concerned  ’primarily 
with  the  understanding  of  personality  from  a  more  general  point  of  view  as  contrasted 
with  a  more  dynamic,  developmental  point  of  view  in  Cluster  4. 

"Cluster  1  appears  to  encompass  personality  from  an  experimental  point  of  view; 
Course  8,  Learning  and  Motivation  is  an  experimental  course. 

"Individual  03  appears  to  center  his  interest  in  personality  or  courses 
related  thereto.  This  point  is  further  substantiated  by  referring  back  to  the 
courses  which  the  subject  did  not  select;  they  are  less  concerned  with  personality 
than  are  the  courses  which  he  selected. 

"We  conclude  that  the  clusters  are  in  general  meaningful,  and  that  the  method 
has  possible  values  as  indicated  further  by  the  following  development  of  the  method. 

D i f ferent i a  I _ Pa  ttern  Ana  lysis 

"The  method  can  be  expanded  to  do  for  patterns  what  item  analysis  does  for  items. 
Item  analysis  selects  the  items  most  highly  related  to  a  criterion.  Analogously, 
our  method  can  be  expanded  to  select  the  combination  of  patterns  which  differentiate 
in  a  fashion  most  similar  to  an  outside  criterion.  The  expanded  method  is  called 
Differential  Pattern  Analys is . 

"Suppose,  for  example,  that  our  problem  were  to  isolate  the  major  patterns 
which  would  best  differentiate  fifty  mental  patients  from  fifty  normals  on  a  test 
of  100  items.  In  this  case,  we  would  proceed  for  each  subject  (patients  and  normals), 
in  the  seme  fashion  as  we  did  above  for  Subject  83;  we  would  determine  the  major 
patterns  for  each  patient  in  terms  of  other  patients  and  for  each  normal  in  terms  of 
other  normals.  This  step  would  yield  a  set  of  patterns  derived  from  patients  and 
another  set  derived  from  norm  Is. 

"Using  both  patient  patterns  and  normal  patterns,  we  would  compute  the  agree¬ 
ment  score  of  every  pattern  with  every  other  pattern  and  place  them  in  a  matrix. 

We  would  then  select  those  patterns  uniquely  characteristic  of  either  patients  or 
normals.  A  pattern  analysis  of  the  matrix  would  facilitate  this  operation. 

"A  cross  validation  would  be  required  to  determine  the  ultimate  value  of  the 
selected  patterns  for  obtaining  the  desired  differentiation  between  patients  and 
normals  ... 


The  Var ia b i 1 i ty  of  Categories  into  which  People  Classify 

"A  further  consideration  of  the  above  methods  emphasizes  an  important  and 
fundamental  problem:  The  categories  of  persons  with  which  any  given  person  can 
be  classified  is  a  function  of  both  the  test  items  in  terms  of  which  the  person 
is  assessed  and  the  group  of  persons  with  whom  he  is  compared."  (McQuitty,  *  ) . ^ 

Summary 

To  understand  a  single  response  by  a  single  individual  is  a  fundamental  problem. 
This  problem  emphasizes  that  we  must  decide  from  theory  or  some  other  point  of  view 
both  (a)  with  what  other  responses  it  it  helpful  to  interpret  the  given  response 
and  (b)  with  what  other  individuals  is  it  helpful  to  interpret  the  behavior  of  the 
given  individual.  The  effectiveness  of  our  interpretation  of  the  single  response 
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by  the  single  individual  depends  on  the 
responses  and  other  individuals  with  wh 


validity  of  our  choices  in  selecting  other 
eh  to  compare  those  of  the  single  individual. 


Once  the  above  decisions  have  been 
r.ot'.'-'s  of  this  paper  and  other  similar 
ful  personality  structures. 


consummated  with  high  validity,  then  the 
methods  are  helpful  in  isolating  meaning- 


Two  especially  effective  methods  of  pattern-analysis  are  described  in  this 
p-per.  Both  methods  begin  with  a  matrix  of  i ntcrassoc i at i ons  between  people  (or 
other  objects).  One  method  searches  for  internally  consistent  submatriccs.  These 
are  usually  small,  each  consisting  of  only  a  few  individuals.  They  are  initial 
indicators  of  statistical  types,  which  arc  relatively  hidden  in  the  data.  They 
are  analyzed  in  a  fashion  which  clarifies  their  appearance  and  develops  them  into 
larger  types.  Through  this  procedure  a  hierarchical  classification  of  statistical 
types  is  constructed  from  the  bottom  up. 


By  way  of  contrast,  Intercolumnar  Correlational  Analysis  builds  the  hierarchical 
structure  from  top  down.  It  divides  a  matrix  into  two  .or  more  submatrices,  at 
least  one  of  which  represents  a  statistical  type.  It  continues  by  dividing  and 
redividing  submatriccs  until  at  the  bottom  every  person  is  represented  as  an 
individual  type.  This  method  has  the  advantage  of  using  all  indices  of  every 
matrix  or  submatrix  in  making  its  classifications,  while  crucial  decisions  in  the 
above  method  are  based  primarily  on  only  a  few  indices. 
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Footnote 


Throughout  this  paper,  quotations  are  included  which  refer  to  tables  and 
figures.  Whenever  necessary,  in  order  to  make  the  code  number  correspond 
to  the  order  in  which  tables  and  figures  appear  in  th<-.  paper  they  have 
been  changed  within  the  quotation. 


Appreciation  is  expressed  to  Multivariate  Behavioral  Research  for 
permission  to  quote  from  McQuitty,  Louis  L.,  Group  Based  Pattern  Analysis 
of  the  Single  Individual,  in  press  (Letter  from  the  Editor,  Dr.  Desmond 
S.  Cartwright,  to  the  author,  dated  16  February  1967) 
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Agreement  Scores  between  Companies* 


A 

B 

C 

D 

E 

F 

G 

H 

A 

12 

16 

16 

14 

6 

11 

7 

B 

12 

17 

17 

13 

6 

8 

10 

C 

16 

17 

26 

10 

8 

9 

13 

D 

16 

17 

26 

10 

12 

1 1 

11 

E 

14 

13 

10 

10 

!_L 

17 

13 

F 

6 

6 

8 

12 

H 

19 

17 

G 

11 

8 

9 

11 

17 

19 

24 

H 

7 

10 

13 

1 1 

13 

17 

24 

*Data  from  McQuitty,  1954 
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Table  2 


Agreement  Scores 

between 

Hierarchi cal 

Types 

AB 

CD 

EF 

GH 

AB 

J3 

4 

5 

CD  J2 

4 

6 

EF  4 

4 

JO 

GH  5 

6 

10 

1 


McQuI tty 


Agreement  Sc 
A 

A  1 

B  2 

C  3i 

D  3i 

E  5 

F  8 

G  6 

H  7 


Table  3 

of  Table  1  Converted  to  Ranks  within  Columns 

B  C  0  E  F  G  H 

2  4  4  4  7}  5}  8 

1  3  3  5i  7i  8  7 

3i  1  2  7?  6  7  4£ 

3i  2  1  7i  5  5i  6 

5  6  8  1  2  4  4± 

8  8  5  2  13  3 

7  7  6i  3  3  1  2 

6  5  6i  5i  4  2  1 
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Table  4 

Pearson i an  Coefficients  of  Correlation  between  the  Companies 

Based  on  Raw  Scores 


A 

B 

C 

0 

E 

F  G 

H 

A  +1.0000  +0.8530  -0.0590  -0.1410  -0.3260  -0.4F30  -0.4880  -0.5310 
B  +O.S.0O  +1.0000  -0.1170  -0.0030  -0.4360  -0.5230  -0.5380  -0.4170 
C  -0.0590  -0.1170  +1.0000  +0.5640  -0.2060  -0.4520  -0.3010  -0.2420 
D  -0.1410  -0.0030  +0.5640  +1.0000  -0.3960  -0.2660  -0.2S40  -0.2920 
E  -0.3260  -0.4360  -0.2060  -0.3960  +1.0000  +0.4600  -0.0150  -0.0450 
F  -  0.4880  -0.5230  -0.4520  -0.7.660  +0.4600  +1.0000  +0.1810  +O.O85O 
G  -0.4880  -0.5380  -0.3010  -0.2940  -0.0150  +0.1810  +1.0000  +0.4610 
H  -0.5310  -0.417°  -  0.242C  -0.2920  -0.0450  -O.O85O  +0.  .'610  +1.0000 
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Table  5 

The  Pearsonian  Coefficients  of  Table  4  Converted 


to 

Ranks 

within 

Columns 

A 

B 

C 

0 

E 

F 

G 

H 

A 

1 

2 

3 

4 

6 

7 

7 

8 

B 

2 

1 

4 

3 

8 

8 

8 

7 

C 

3 

4 

1 

2 

5 

6 

6 

5 

D 

4 

3 

2 

1 

7 

5 

5 

6 

E 

5 

6 

5 

8 

1 

2 

4 

4 

F 

7 

7 

8 

5 

2 

1 

3 

3 

G 

7 

8 

7 

7 

3 

3 

1 

2 

H 

8 

5 

6 

6 

4 

4 

2 

1 

et.'f  »;v 


n+W*  r  rm 
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Table  6 

Fifth  Intercolumnar  Matrix  of  Table  4 


A 

B 

C 

0 

E 

F 

G 

H 

A 

+1 

+1 

+  1 

+1 

-1 

-1 

-1 

-1 

B 

+1 

+1 

♦  1 

+1 

-1 

-1 

-1 

-1 

C 

+1 

+1 

+1 

+1 

-1 

-1 

-1 

-1 

0 

+1 

+1 

+  1 

+1 

-1 

-1 

-1 

-I 

E 

-1 

-1 

-1 

-1 

+1 

+1 

+1 

F 

-i 

-1 

-1 

-1 

+1 

■H 

+1 

+1 

G 

-1 

-1 

-1 

-1 

+1 

+1 

+1 

+1 

H 

-1 

-1 

-1 

-1 

+  1 

+1 

+1 

+1 

McQuItty 


Table  7 

The  Emergence  of  Types  AB  and  CD 

ABCD  ABCD 


A  +1.0000  +0.8530  -0.0590  -0.1410 
B  +0.8530  +1.0000  -0.1170  -0.0030 
C  -0.0590  -O.II7O  +1.0000  +0.5640 
D  -0.1410  -0.0030  +0.5640  +1.0000 
Original  Correlations 


ABCD 

A  +1.0000  +0.9987  -0.9936  -0.9970 

B  +0.9987  +1.0000  -0.9979  -0.9920 

C  -0.9936  -0.9979  +1.0000  +0.9819 

D  -0.9970  -0.9920  +0.9819  +1.0000 

Second  Intercolumnar 
Matrix 


A  +1.0000  +0  9696  -0.9122  -0.9587 

B  +0.9696  +1.0000  -0.9650  -0.8885 

C  -0.9122  -0.9650  +1.0000  +0.7632 

D  -0.9587  -0.S885  +0.7632  +1.0000 

First  Intercolumnar 
Matrix 


ABCD 

A  +1.0000  +1.0000  -1.0000  -1.0000 

6  +1.0000  +1.0000  -1.0000  -1.0000 

C  -1.0000  -1.0000  +1.0000  +0.9999 

D  -1.0000  -1.0000  +0.9999  +1.0000 

Third  Intercolumnar 
Matrix 
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Common  Course  Selection  in  Cluster  4,  Figurt  7 
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Principles  of  Behavioral  Taxonomy  and  the  Mathematical 
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University  of  Illinola 

and  Malcolm  A.  Coulter 
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1.  Two  Concepta  of  Type:  Homostat  and  Segregate 

Placing  people  In  types  is  an  ancient  pastime;  but  one  still  far  from 
being  fully  understood  In  respect  of  both  conceptual  aims  and  methods  of 
analysis.  For  example,  the  reciprocal  relation  of  typing  to  the  description 
and  prediction  by  attributes  and  dimensions,  discussed  in  the  earlier  Q- 
technlque  controversies  (Burt,  1937;  Cattell,  1951;  Stephenson,  1936),  yet 
remains  to  be  properly  worked  out.  To  this  day,  the  conceptual  basis  for 
types  has  remained  crude  compared  to  that  developed  clearly  for  attributes 
(by  surface  and  source  traits  (Cattail,  1946),  as  defined  in  modern  statistical 
models  (Burt,  1950;  Horat,  I960;  Thur stone,  1947;  Tucker,  1964)). 

Blseirtiere  (Cattell,  1957),  a  list  has  been  given  of  the  rank  and  riotous 
verbal  usages  of  "type".  Such  use  as  in  Jung  (1923),  and  many  others  who 
define  types  as  the  arbitrarily  cut  extremes  of  any  bipolar  continuous  dimen¬ 
sion,  we  shall  set  aside  as  more  aptly  handled  by  direct  measurements  on  bi¬ 
polar  source  traits.  What  vc  wish  to  designate  as  a  type  is  the  formal  entity 
central  to  much  psychological  and  biological  classification,  embodied  in  the 
lest  by  the  concept  of  a  taxon,  e.g. ,  species,  genus,  family,  etc.  (In 
psychology,  we  need -not  necessarily  adopt  the  biologist's  further  concern  with 
"dendrograms,"  i.e.,  the  arrengemsnt  in  classlflcatory  hierarchies.)  Types 
appear  in  psychology  as  groupings  by  occupational  skill,  complexes  of  attitude 
in  political  groups,  pathological  syndromes,  and  by  certain  genetically  deter¬ 
mined  patterns  of  behaviour. 

Psychomstrlcs  has,  in  its  main  developments,  ignored  this  granulation  of 
its  populations  in  favor  of  a  simplified  world  of  homogeneous  normal  distribu¬ 
tions  of  character 1st lea  and  linear  relations  between  them.  Over  the  normal 
ranges  of  behaviour,  the  approximation  has  been  good  enough  to  permit  the 
effective  prediction  of  individual  difference  by  means  of  broad  personality 
factors.  But  as  research  broadens,  the  realities  of  more  complex  natural 
distributions  demand  to  be  considered.  Considerations  of  efficiency  require 
that  our  models  begin  more  explicitly  to  encompass  types,  and  the  non-linear 
relations  and  pattern  effects  which  go  with  them. 

We  shall,  therefore,  begin  with  the  central,  if  initially  over-simplified, 
definition  of  a  type  es  the  most  representative  pattern  in  a  group  of  individuals 
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located  by  a  high  relative  frequency  —  a  soda  —  in  tha  diatri button  of  persons 
in  multidimensional  space.  This  dafinition  will  ba  aada  no  re  stringent  a a  wa 
proceed.  The  principle  poaaibilitiee  are  illustrated  for  one  and  two  dimensions 
by  Figure  It 


(izsert  Figure  1  hare) 

Figure  1  ia  intended  to  taring  out  thati  (l)  non-normal,  aultinodal 
groupings  can  easily  exist  in  a  multivariate  distribution  even  whan  tha  distil* 
button  projected  on  any  one  of  the  dimensions  la  virtually  normal;  (2)  all  t 
nodes  are  relative  as  to  density,  so  thati  as  at  A^,  A2I  Bj.  Bji  C,,  0,1  and 
Dl,  D2  in  Figure  1,  one  can  have  "types  within  types"!  *nd  (3)  that  there  arb 
really  two  dlstinot  possible  definitions  of  type,  one  hinging  on  (a)  high  mutual 
slailarity  of  aembere;  l.e.,  all  ooaing  within  a  oirounsoribad  dlstanoa  of  one 
another  as  illustrated  by  those  lying  in  tha  dotted  ciroles  1;  2*  3»  ind  4,  and 
(b)  forming  part  of  a  group  in  which,  though  arms  members  nay  extend  to  remote 
diatanoas  from  others,  each  is  lees  remote  from  another  member  of  that  group 
than  from  individuals  outside  tha  group,  a.g.,  as  shown  by  tha  types  B  and  C. 
Thus,  parsons  in  the  regions  A,  Cj.,  and  Bj  constitute  two  types,  1  and  2, 
aooording  to  definition  fa),  whereas  they  fell  into  three  types.  A,  C},  and  B«t 
according  to  definition  (b). 

A  definition  with  Euclidean  or  Boolean  rigour  for  these  two  oonoepts  will 
be  given  later,  but  on  tha  temporarily  adequate  basis  already  given,  we  shall 
refer  to  them  respectively  by  the  term  hemoetat,  meaning  "a  set  of  people 
standing  at  closely  similar  positions  in  space",  and  segregate,  implying  "a  set 
consisting  of  people  continuously  related  through  other  people  in  the  eat  and 
isolated  from  those  outside,  but  not  necessarily  similar  in  position,  i.e., 
not  of  high  homogeneity".  Readers  may  find  it  oonvaniant,  as  we  have  in  our 
own  laboratory  discussions,  to  designate  thorn  "otat"  and  "ait"  respectively. 

A  glance  at  past  psychological  work  on  types  a.g..  HoQuitty'e  pattern 
analysis  (1963)*  that  of  Runnally  (1962),  Overall  (1964),  and  of  ourselves  with 
the  pattern  similarity  ooeffioient  (1949,  1930,  1932)*  shows  that  attention  has 
hitherto  been  operationally  direoted  exclusively  to  homostats,  despite  the  con¬ 
cept  of  segregates  having  sometimes  been  obviously  present  in  the  writer's  mind. 

2.  The  Most  Promising  Model,  From  a  Scientific  Standpoint 

The  main  aims  in  research  on  types  are:  (l)  To  produoe  a  methodology 
for  operationally  locating  and  identifying  segregates  and  homostats.  (2)  To 
develop  mathematioo-statistioal  formulae,  based  on  improved  models  of  type,  for 
utilising  test  remits  for  prediotive  purposes  and  for  investigating  laws  which 
may  arise  from  the  peculiar  nature  of  types. 

Briefly  to  anticipate  what  this  cecond  step  may  comprise,  we  would  point 
out  that  Aristotelian  classification  permits  predictions  of  the  kinds  "nils  is 
a  dog;  therefore  it  may  bite";  or  "This  is  a  sohisophrenio;  therefore  the 
proepeot  of  remlseions  is  not  high*"  In  other  word*,  a  classification  by 
variables  of  one  kind  may  permit  prediotion  on  others  not  inoludod  in  the  imme¬ 
diate  observations.  As  will  be  brought  out  later,  the  use  of  types  need  not 
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•Cop  with  this  Arlstotelean,  categorical  formulation.  It  will  lead  rather  to 
the  recognition  that  in  numerical  data,  the  relation  of  a  "test"  to  a  "criterion" 
■ay  be  very/  different  within  species  from  that  obtaining  between  species.  Thus, 
the  relation  of  two  variables  could  be  nc:>llnear  across  all  individuals  in  the 
total  genus,  yet  exactly  linear  within  each  species.  The  use  of  distinct  in- 
end  between-type  dimensions  Instead  simply  of  a  single  set  of  broad  dimensions 
across  s  genus  demands  that  before  data  is  fed  to  the  computer,  one  has  to  con¬ 
sult  an  encyclopedia  (to  recognize,  by  appropriate  properties,  each  individual's 
belonging  to  a  particular  species).  The  reward,  however,  of  this  classlficatory 
labour  is  likely  to  be  a  more  accurate  prediction  from  the  individual's  scores, 
or  the  discovery  of  clearer  laws  for  the  segregated  types,  obscured  in  the  mix¬ 
ture  of  species  in  the  genus. 

As  we  proceed  to  more  precise  concepts  for  both  discovering  and  using 
types,  it  is  necessary  (since  particular  exemplifications  in,  for  example, 
soology,  psychology,  astronomy,  mineralogy  are  likely  to  differ)  to  define  the 
breadth  of  our  approach.  Our  aim  is  to  be  comprehensive  (our  association  with 
Sokal  and  Sneath  (1963)  and  Sades  (1964)  in  applications  to  entomology  has  been 
encouraging  and  enlightening  in  this  respect)  and  we  believe  that  the  psycholo¬ 
gist,  before  he  devotes  his  ingenuity  to  statistics,  would  do  well  to  take  a 
philosophical  pause,  for  he  needs  to  develop  a  plausible  scientific  (not  merely 
a  statistical)  model  of  types,  based  on  speculation  as  to  how  and  why  they  arise. 
Briefly,  our  theory  ia  that  types  arise  from  three  causes:  (1)  Adaptive  Success, 
because  of  special  value  in  the  combination  (survival  value  in  biology,  utility 
of  human  artefacts),  (2)  Combinations  Required  by  Natural  Law,  where  a  pattern 
repeats  itaelf  modally  because  it  is  required  by  a  particular  combination  of 
natural  laws,  e.g. ,  crystalline  forms,  cloud  types,  solar  systems,  and  (3)  Bio- 
social  Gravitation.  This  supposes  that  once  the  beginnings  of  a  type  exist 
there  will  be  a  tendency  by  Imitation  for  individuals  to  gravitate  towards  its 
controld.  This  occurs  socially  in  fashions  and  fads  and  biologically  in  species 
formation.  (Sewall  Wright's  "genetic  drift"  has  relations  to  the  latter.) 
Obviously,  psychology  has  types  of  all  three  kinds:  the  skill  and  personality 
patterns  of  different  occupations  are  examples  of  functional  adaptations;  the 
behaviour  pattern  of  delirium  tremens  or  Huntington's  chorea  have  no  adaptive 
value  and  occur  simply  as  inevitable  patterns  from  laws  of  neurological  break¬ 
down,  etc.,  cultural  and  racial  types  relate  to  the  third  source. 

All  three  sources  Indicated  by  this  theory  of  type  origins  would  result 
in  some  combinations  of  parameters  being  represented  by  high  (modal)  frequencies 
while  other  zones  (combinations)  in  the  coordinate  system,  which  theoretically 
might  be  filled,  remain  empty  of  individuals.  Parenthetically,  it  will  be  men¬ 
tioned  that  in  functional  adaptations  dependent  on  either  evolution  or  human 
Invention,  the  additional  possibility  must  be  considered  that  some  zones  are 
unoccupied  not  because  they  necessarily  represent  a  nen- functional  combination, 
but  becauae  for  some  reason  they  cannot  be,  or  have  not  yet  been  reached.  In 
biology,  the  intermediate  mutational  steps  necessary  to  reaching  some  advan¬ 
tageous  end  pattern  may  be  chemically  unstable  or  biologically  lethal.  The 
giraffe's  neck  had  time  to  grow  gradually  as  his  forelegs  grew,  so  that  he 
achieved  the  advantages  of  height  without  losing  his  capacity  to  drink;  but 
other  useful  biological  combinations  might  be  too  much  of  a  "tour  de  force." 
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The  matrix  of  scientific  necessities  out  of  which  types  ere  born  will 
presumably  be  indicated  in  soma  degree  by  the  varying  textures,  dendrograms 
(hierarchies)  and  cluster  sizes  emerging,  as  discussed  in  Section  8  below. 
However,  both  for  adaptive  types  and  natural  law  types,  there  is  reason  to 
expect  that:  (1)  there  will  arise  an  unusually  high  frequency  of  cases  in 
which  some  particular  range  of  scores  on  parameter  x  la  associated  with  a 
special  range  on  parameter  y,  because  this  is  functionally  useful  and  is  pre¬ 
served  and  multiplied.  Secondary  pairs  of  optimum  ranges  will  generally  also 
exist,  but  apart  from  these  modes,  instances  of  individuals  with  other  co^l- 
natlons  will  be  rare;  (2)  among  individuals  at  these  modes  soma  entirely  new 
organs  and  therefore  dimensions  may  appear  which  are  not  present  in  the  general 
’’population"  (and,  therefore,  the  distribution  of  these  for  the  general  popu¬ 
lation  would  have  an  extensive  positive  skew.  For  example,  among  the  types  on 
the  tea  table,  only  teapots  distribute  on  the  "length  of  spout"  variable); 

(3)  a  class  which  we  may  call  "across  species"  variables  may  be  practically 
normally  distributed  over  membrrs  of  the  whole  genua  daaplte  many  "species  type" 
segregations,  while  the  class  of  "within  species"  variables  will,  as  stated,  be 
badly  skewed.  These  type  concepts  thus  imply  the  recognition  of  three  classes 
of  variables,  with  greatest  relevance,  respectively  within  species,  between 
speclea,  and  across  the  whole  population  of  the  genus;  and  (4)  by  reason  of 
information  in  these  variables,  one  will  in  general  expect  to  have  to  complete 
the  description  of  type  segregation  and  distribution  by  reference  to  "higher 
order"  structures,  briefly  indicated  here  by  the  terms  textures  and  hierarchies. 

Some  slowness  in  coming  to  grips  with  the  necessary  concepts  in  this 
field  must  probably  be  ascribed  to  certain  habits  of  mind,  which  favour  simpli¬ 
fied  mathematical  abstractions  even  when  they  fall  to  describe  and  do  honour  to 
the  intrinsic  irregularity  of  the  data.  Analytical  geomaters  are  not  easily  at 
hone  with  topologists,  and  here  even  topologists  themselves  are  being  forced  to 
face  the  intractable  specificity  of  detail  elsewhere  faced  only  by  topographers .' 
The  problem  is  very  close  to  that  of  describing  the  actual  cloud  masses  at  a 
given  moment  in  an  n-dluenslonal  sky.  Even  when  this  goal  is  acaltted,  most 
people  begin  by  thinking  of  discrete  cululus  clouds  neatly  spaced  in  a  summsr 
sky,  but  are  forced  at  the  end  to  come  to  terms  with  the  ultimate  in  irregular 
masses  —  an  October  storm-wrack.  Those  who  develop  geometrical  models  and 
statistical  procedures  have  no  alternative  but  to  brace  themselves  for  this 
degree  of  complexity  if  they  wish  to  describe  the  variety  of  human  beings  in  a 
society  or  what  actually  happens  in  biological  evolution. 


3.  Two  Alternative  Principles  for  Locating  Stats 

Anyone  who  has  followed  the  history  of  psychologists'  attempts  to  handle 
the  type  concept,  with  Q-sort,  D*,  discriminant  functions,  Bollinger's  B 
coefficient  latent  class  analysis,  etc.,  must  admit  that  little  of  theoretical 
or  practical  psychological  importance  has  yet  emerged,  and  he  may  Justifiably 
wonder  whether  the  tools  and  concepts  have  been  adequate.  For  example,  psychia¬ 
trist's  syndroms  groupings,  despite  some  application  of  correlation  methods  by 
Began  (1952),  Huffman  (see  Cattell),  Lorr  (1962),  Wlttenborn  (1951),  and  others 
continue  to  take  their  authority  from  subjective  clinical  impressions,  while  in 
social  psychology  and  related  areas,  it  is  hard  to  point  to  any  theory  which 
has  arisen  from  a  statistically  adequate  demonstration  of  types. 
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Elsewhere  (Cattail  t  1951,  1992a),  appeal*  hare  been  mad*  for. recognition 
that  in  classifying  individual*  by  ras«abl*ao*  method* i  (1)  Q-eort1  ia  vitiated 


^Q-eort,  principally  propagated  by  Rogers  (1951),  1*  a  rank  correlation  version 
of  what  aay  be  oalled  a  <^-bar  (5-)  technique,  it  1*  important  ol early  to  dis- 
tlnfuiah  Q-teohnlqu*  (soemtimes  oalled  Ql-teohnlque)  which  stops  at  finding 
eerrelation  o lust era,  froe  true  ^-technique,  vhloh  is  a  full  faotor  analytic 
teohnique  (the  transpose  of  R- technique)  aimed  at  obtaining  dinenaions.  Slnoe 
Q-teohniquo  depends  on  the  correlation  coefficient,  one  oannot,  for  the  above 
reasons,  afres  with  its  otherwise  oareful  and  precise  use  in  the  extensive 
taxon oni o  work  of  Sokal  and  Sheath  (1963).  Types  are  not  faotor*.  Q-teohnlque, 
on  th*  other  hand,  yields  types  and  will  do  so  without  throwing  away  inport  ant 
evidsnoe  if  it  uses  r  instead  of  r.  Incidentally,  for  brevity  we  shall  refer 
to  the  square  matrix,  with  the  same  people  at  top  and  side,  whioh  is  a  common 
beginning  of  all  the  above  resmsblanoe  methods  of  typing,  as  a  "Q  matrix." 


so  long  as  variables  rathor  than  faotor*  are  used,  and  without  a  principle  for 
sampling  variables.  For  indices  of  resemblance  are  oompletely  unstable  and 
meaningless  without  either  resolving  variables  into  faotor*  or  taking  them  ia 
a  stratified  samples  (2)  use  of  the  correlation  ooeffioient  gives  misleading 
result* ,  for  It  throws  away  indispensable  information,  recording  only  the  shape 
similarity  of  two  profiles  without  reference  to  level  or  aooentuatlon  (Cat tell, 
195l)l  (3)  Hoi  singer*  s  B  ooeffioient  (Holsinger  and  Harman,  1941)  disregards  the 
difference  between  nuolear  and  phenomenal  el  us  ter  struoture  whioh  is  discussed 
below  1  (4)  latent  olass  (sometimes  oalled  latent  "struoture")  (Lasarsfeld,  I960), 
though  a  statistically  dearly  developed  method,  does  not  meet  the  need  for  a 
perametrio  treatment  of  the  assignment  of  individuals  to  classes  1  (5)  th*  multi* 
pie  dlsorlmina.it  function  is  not  a  means  of  finding  types,  but  only  of  giving 
emphasis,  rigidity,  and  apparent  precision  to  groupings  initially  discovered  by 
other  and  usually  more  subjective  methods. 

Ihoed  with  these  inadequeoles  of  pressnt  type  oonoepts  and  soar  oh  methods, 
one  reoognlses  the  possible  need  far  a  radical  re-orientation.  This  is  reaohed 
first  by  realising  that  the  idea  of  type  hides  two  very  distinct  oonoepts  — 
entitles  whioh  w*  oall  stats  and  alts  —  and,  seoondly,  by  reoognlsing  two 
distinct  methodological  approaches  to  looating  these  in  nature.  The  impllolt 
definition  of  a  stat  (or  "homootat")  above  oaa  be  sharpened  now  to  a  set  of 
Individuals  within  whioh  th*  mutual  reoamblanoe  of  all  pairs  exoeeds  a  eertain 
value,  significantly  higher  than  that  obtaining  between  pairs  In  th*  population 
at  random.  Although  a  segregate  (or  "ait")  is  different,  we  shall  find  that 
th*  stat  is  a  necessary  oonoept  in  reaching  it,  so  th*  looation  of  stats  is 
first  treated  her*  and  th*  operational  definition  of  aits  is  deferred. 

To  looat*  a  stat,  on*  of  two  broadly  different  approaches  are  open  to 
us,  as  follow*! 

(l)  Th*  Inter-Id  Relation  Method.  This  starts  with  people  (or  other 
Individual  patterns) 2  as  re ferenoe  points  In  a  spaoe  defined  by  coordinates 
corresponding  to  the  faotor* ,  etc. ,  by  whioh  individuals  are  measured. 
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Some  general  tarn  la  badly  naadad  for  tha  indirldual a  who  provide  ua  with  tha 
dimensions  (in  factor  analyaia)  or  typaa  ultimately  extracted*  Although  tha 
peychologist  commonly  thinka  of  people,  thaaa  antitlaa  (aaoh  dafinad  aa  a 
pattern),  even  in  psychology,  must  alaO  include  auoh  t hinge  aa  groups,  pro¬ 
cesses  ,  culture  patterns,  etc.  Since  a  set  of  finaly  dafinad  and  inter- relatad 
terms  for  all  alanenta  in  the  basic  data  relation  matrix  has  been  adopted  in 
the  new  Handbook  of  Multivariate  Experimental  Psychology  (Cattell,  1  i)i  I  am 
proposing  to  use  here  consistently  the  texm  id  for  any  and  all  entities  of  this 
kind.  That  is  to  say,  a  Q-oatrix  is  defined  as  bordered  by  ids  and  having  in 
its  cells  scalar  quantities  expressing  the  relation  between  ids.  Incidentally, 
this  usage  of  id  is  so  remote  from  the  other  usage  in  psychology,  if  psycho¬ 
analysis  is  to  be  so,  that  no  comparison  can  exist. 


It  calculates  the  distance  of  aaoh  person  from  every  other,  locating  first  the 
dense  "plexuses"  of  people,  and  aeoondarily,  tha  position  of  the  centroids  of 
such  groups.  Most  aimply,  a  square  matrix  (a  "Q  matrix")  is  set  up  bordered 
by  the  same  set  of  people  on  the  two  sides.  Into  the  cells  are  mitered  the 
quantities  which  express  the  similarities  of  the  members  of  the  pairs  defined 
each  by  a  row  and  a  column.  Methods  can  then  be  developed  to  find  the  o  lust  era 
of  people  constituting  stats,  and  latsr,  aits. 

(2)  The  Density- in- Space  or  "Cartet  Count"  Method.  Here,  one  begins 
with  ids  placed  in  position  in  a  coordinate  system  by  a  matrix  of  scores  on 
the  coordinates.  Convenient  intervals  are  then  taken  an  these  coordinates  to 
define  "cartets"  --  which,  in  a  two-dimensional  map,  would  be  squares  fixed 
by  boundaries  of  latitude  and  of  longitude.  A  computer  program  can  then  be 
written  to  count  the  number  of  ids  in  each  such  square  ("hyper-cube"  or,  most 
generally,  "cartet, "  if  we  may  suggest  such  a  term  (after  Desoartes))  for  such 
a  rectangular  eubspaoe  in  a  lattice  of  Cartesian  produets.  Fixing  a  "signifi¬ 
cantly  high  density"  count  by  relation  to  the  average  total  density,  one  oould 
first  set  aside  stats,  and,  by  secondary  prooess  discussed  below,  alts.  Some 
experiments  would  be  necessary  regarding  the  size  of  the  component  subsets  of 
Cartesian  products  in  order  to  beet  bring  out  the  modal  groupings  in  relation 
to  the  general  texture  of  the  domain. 

(kie  must  recognize  from  the  beginning,  however,  that  the  "oartet-oount" 
method  will  soon  rsaoh  a  number  of  oartets  to  be  counted  that  oould  be  onerous 
even  for  an  electronic  computer.  The  difficulty  is  illustrated  by  the  faot 
that  with  only  16  dimensions,  and  intervals  restricted  to  just  6  in  number 
(3  plus  and  3  minus)  subtending  each  one  standard  soore,  for  each  coordinate 
the  total  number  of  hypercubes  (oartets)  which  would  have  to  have  their  con¬ 
tents  examined  by  counting  is  6 -  2, 830, 000, 000, OCX)  (approx.).  On  the  other 
hand,  the  number  of  resemblance  entries  to  be  examined  in  a  Q-aatrix,  by  the 
inter-id  method  ((l)  above),  also  increases  exponentially  and  is  fairly  for¬ 
midable  with  four  or  five  hundred  people.  Typing  procedures,  with  any  adequate 
sample  of  ids,  are  neoeasarily  and  characteristically  going  to  be  demanding  of 
computer  time.  In  practice,  with  the  cartet  oount  approach,  one  might  often 
be  content  to  use  only  two  coarse  soore  intervals  per  coordinate  scale,  but  in 
a  16  element  profile  (which  is  probably  fairly  typical  of  psychological  needs), 
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the  oube  oount  would  still  have  to  cover  2^  -  240,000  counts. 

In  this  artiole,  ws  shall  concentrate  on  the  inter-id  approach  sinoe  the 
eartet  oount  procedure  is  obvious.  Here,  one  needs  first  to  find  a  suitable 
index  of  resemblance  between  any  two  ids,  presuming  each  to  be  already  measured 
on  a  profile  of  dimensions.  For  this,  and  allied  purposes,  the  profile  simi¬ 
larity  index,  r«,  has  been  developed  (Cattell,  1949)  ns  being  free  of  the  draw¬ 
backs  of  r,  of  nahalanobis'  D  (1956),  and  of  some  other  at  times  popular  indices* 


The  profile  similarity  coefficient,  rp, 


has  the  formulas 


2k,  - 

♦  I’'*2 


(1) 


where  k  is  the  number  of  profile  elements)  each  d  is  the  difference,  in 
standard  scores,  of  the  two  people  concerned,  on  any  one  profile  element,  and 
k.  is  the  median  X2  value  for  k  degrees  of  freedom.  At  k-20,  km"19.337»  80 
tnat  above,  say,  20  profile  elements  there  is  not  much  argument  for  using  kg 
instead  of  a  simple  k  in  the  first  part  of  the  numerator  and  denominator, 
lhe  former  will  exactly  divide  the  possible  rp's  into  equal  numbers  of  positive 
and  negative  values,  but  the  former  will  give  a  zero  sum  of  negative  and  posi¬ 
tive  rp's.  The  advantages  of  this  rp  over  the  Mahalanobis  (1956)  distance 
fUnotlon,  D,  ares 

(1)  That  it  gives  comparable  values  from  study  to  study  in  comparing  two 
ids,  regardless  of  the  different  metrics  and  numbers  of  profile  elements.  This 
it  does  because  (a)  all  coordinate  values  are  in  standard  score,  not  different 
units  for  each,  and  (b)  the  formula  allows  for  differences  in  the  number  of 
coordinates  (profile  elements)  lr.  evaluating  the  "distance."  Moreover,  it 
behaves  very  similarly  to  the  familiar  correlation  coefficient,  registering  0 
where  the  relation  between  two  people  is  no  better  than  chance,  +1.0  when  they 
are  perfectly  alike,  and  -1.0  where  they  are  as  unlike  as  possible.  3y  con¬ 
trast,  one  never  knows  what  the  meaning  of  a  particular  D  value  is  without  an 
elaborate  consideration  of  additional  circumstantial  fact3  or  the  making  of 
additional  calculations. 

(2)  Sinoe  different  investigations  in  the  same  domain  often  differ  some¬ 
what  in  the  number  of  dimensions  they  employ,  both  of  the  above  features  (l(a) 
and  1(b))  help  in  surveys  attempting  to  integrate  oonolusions  about  types. 

(5)  A  significance  test  has  been  worked  out  by  Horn  (1961)  for  r p  and 
other  properties  are  under  investigation.  This  and  other  developments  of  the 
index  promise  reasonable  prospects  of  programs  making  it  negotiable  in  further 

areas. 


(ki  the  other  hand,  unlike  D,  rp  does  not  have  simple  Euolidean  distance 
properties.  The  relation  of  rp  to  D  (when  the  latter  is  put  in  standard  soores 
form)  is  shown  in  Figure  2.  It  will  be  seen  that  in  the  central  range,  it  is 
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approximately  linear  and  the  relation  la  atlll  cloaar  to  linearity  for  the 
apeclal  r«  derivative,  r^,,  propoaed  below.  However,  non-luclldean  apacea. 

If  properly  underatood,  can  be  aa  uae fully  manipulated  aa  Kuclldean  and  the 
temptation  of  convenience  offered  by  the  uae  of  familiar  apace  muat  be  rejected 
if  the  Euclidean  repreeentatlon,  D,  doe  a  not  alao  give  the  greater  psychologi- 
cal  convenience,  e.g. ,  In  vocational  aelectlon,  ate.,  which  la  provided  by  the 
uae  of  rp. 


(Inaert  Figure  2  here) 

With  thia  cryatalllcatlon  of  an  acceptable  maana  by  which  the  almilarlty 
of  Ida,  l.e.,  of  people,  atlmulua  altuatlona,  groups ,  processes,  etc.. 


^Parenthetically,  to  ward  off  any  Incorrect  aaaumptlon  of  formal  narrowneaa  In 
our  approach,  let  it  be  noted  that  the  whole  treatment  of  almilarlty  by  attri¬ 
butes  as  proposed  here  Includes  application  to  processes  as  well  as  structures. 
A  psychiatrist,  for  example,  may  say  that  his  assignment  of  an  Individual  to 
the  syndrome  type  "schizophrenia"  Includes  observations  on  the  course  of  onset 
itself,  and  the  notion  of  a  malign  outcome.  Such  process  attributes  can,  of 
course,  be  included  along  with  structured,  "immediate"  measures  in  the  desig¬ 
nation  of  a  specific  profile  of  measures.  (Mien  rp  is  used  thus  to  locate 
types  of  processes  rather  than  types  of  persons,  certain  time  sequence  Infor¬ 
mation,  distinguishing  a  configuration  Cattail,  1957,  p.  396)  from  a  profile, 
muat  be  included.)  The  procedure  can  also  be  used  for  grouping  proceasea  aa 
such,  aa  discussed  in  detail  elsewhere  (Cattell,  1966). 


measured  can  be  measured,  using  a  profile  of  dimensions,  let  us  turn  to  the 
next  problem.  This  concerns  the  uae  of  such  an  index  in  the  ld-relatlon 
procedure  for  finding  types. 

4.  Defining  Stats  In  (a)  General  Purpose  Dimensions 
and  (b)  Special  Criterion  Functions 

It  is  part  of  the  conceptual  Inadequacy  of  the  approaches  hitherto  made 
by  scientists  to  the  type  concept  —  even  in  some  of  the  bust  technical  work, 
as  of  that  by  McQultty  (1963)  or  Sokal  and  Sneath  (1963)  --  that  operations 
have  been  set  up  to  find  a  homos  tat  without  recognising  that  it  will  not  have 
any  uniqueness  of  center  and  boundary,  for  auch  uniqueness  is  characteristic 
only  of  a  segregate.  This  arbitrariness  and  siAjectlvlty  of  the  stat,  not 
only  in  width  but  also  in  position  and  even  with  an  exact  Index  of  similarity, 
can  be  most  quickly  realised  by  a  two  dimensional  example,  as  in  Figure  1. 

The  investigator  has  to  begin  with  soma  choice  of  similarity  level  as 
"significant"  or  "outstanding";  and  though  a  rationale  is  given  below  for 
computing  a  finally  objective  boundary  value  for  rp,  the  limit  of  belonging 
muat  initially  be  arbitrary  and  tentative.  Defining  a  atat  by  the  property 
that  every  individual  in  it  muat  resemble  every  other  above  this  limit  means 
that  in  Figure  1  (neglecting  the  slight  departure  of  rp  from  an  Euclidean 
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distance)  all  people  within  a  circle  of  diameter  equal  to  rp  are  in  the  sane 
atat.  If  the  rp  limit  la  well  chosen,  the  majority  of  such  circles  drawn  over 
much  of  the  graph  will  each  enclose  less  than  two  people.  Only  where  there 
is  a  dense  modal  accumulation  will  more  than  two  people,  i.e.,  a  type,  be 
"lassoed"  as,  for  example,  at  1,  2,  3,  and  4. 

However,  we  must  note  (a)  that  sometimes,  as  at  2,  individuals  in  two 
distinct  alts  will  be  caught  in  one  stat,  and  (b)  that,  as  in  3  and  4,  two 
correctly  defined  stats  will  nevertheless  overlsp.  In  fact,  there  could  be 
a  whole  series  of  stats  in  a  denae  area,  each  including  many  Ids  and  different 
only  in  the  Inclusion  of  one  different  person  from  its  neighboring  stat.  This 
can  readily  be  seen  if  we  Imagine,  say,  50  persons  evenly  following  in  the 
given  coordinate  space  in  a  long  row,  with  a  circle  diameter  chosen  to  Include, 
say,  3  persons  at  a  time  and  finishing  with  48  3-stats. 

Furthermore,  when  we  pursue  in  Section  6  below,  the  operational  steps 
for  locating  stats  from  Q  matrices  filled  with  similarity  Indices,  i.e.,  for 
going  from  the  algebraic  to  the  geometrical  view  here  briefly  invoked,  it 
will  be  found  that  the  psychologist  needs  two  distinct  concepts  of  stats, 
which  are  there  named  and  defined  as  nuclear  and  phenomenal  stats.  Never¬ 
theless,  despite  these  complications  and  uncertainties  of  ultimate  inference, 
the  recording  of  stats  has  both  direct  value  in  Itself  and  ancillary  value 
aleo  in  providing  a  basis  for  proceeding  to  aits.  With  this  foretaste  of  the 
problem  of  discovering  stats,  to  which  we  shall  return,  we  must  pause  a 
moment  to  solve  a  prior  problem,  one  which  stands  squarely  in  the  way  of  our 
progress,  namely,  an  uncertainty  about  the  very  meaning  of  similarity.  For 
in  spite  of  the  apparent  precision  of  our  rD  coefficient  of  profile  similarity, 
it  will  become  apparent  that  we  cannot  use  it  in  all  situations  we  might 
encounter  until  we  have  corrected  it  to  less  restrictive  assumptions.  In 
fact,  we  must  pause  briefly  in  this  section  to  make  some  almost  philosophical 
Inquiries  about  the  purpose  and  setting  of  its  use. 

The  design  of  rp  has  cleared  it  of  giving  accidental  and  unknown 
weights  to  different  profile  constituents,  but  it  has  left  it  with  the 
rigid  assumption  that  all  dimensions  receive  exactly  equal  nominal  weight  -- 
and  this  may  not  fit  all  purposes. 

As  was  pointed  out  in  the  original  logic  for  rp  (Cattell,  1949)  both 
the  philosopher  and  the  man  in  the  street  have  always  been  haunted  by  a 
distinction  between  the  character  of  the  object  in  itself  ("das  Ding  an  slch" 
of  Kant)  and  what  it  does  or  is  useful  for.  (Perhaps  even  Hume's  "primary" 
and  "secondary,"  or  the  theologian's  "grace"  and  "works"  might  be  related  to 
this  distinction.)  Certainly  in  the  operation  of  psychological  prediction, 
we  constantly  and  confidently  make  a  distinction  between  traits  or  "predictors" 
and  predicted  performances  or  "criteria."  Viewing  this  from  the  standpoint 
of  type  distributions,  it  is  easy  to  see  that  the  modal  groupings  we  would 
get  on  certain  criteria  will  differ  from  those  we  would  get  on  the  totsl 
profile  of  traits.  By  the  same  token,  for  a  barber,  a  brush,  a  comb,  and  a 
pair  of  scissors  belong  in  the  same  class,  while  a  screwdriver  and  a  bottle 
opener  do  not.  But  by  the  urgent  drinker,  the  bottle  opener,  the  screw¬ 
driver  and  the  scissors  are  seen  as  promising  members  of  a  class  to  which 
brush  and  comb  do  not  belong. 
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The  present  writers,  in  bssic  personality  research,  do  not  accent  the 
psychometrist's  differentiation  of  test  and  criterion  aa  having  any  fundamental 
status.  But  the  difference  between  the  total  personality  profile  of  behaviour 
extracted  traits  and  a  single  "criterion"  (or  any  other)  performance  la  real 
enough.  The  latter  is  always  some  quite  specific  mathematical  function  of  the 
former.  On  such  a  specific  function,  the  modal  stats  (or  alts)  will  be 
peculiar;  to  that  derived  variable,  l.e.,  different  from  the  dlatrlbutlon  on 
other  functions  or  in  the  k-dimensional  trait  profile  space.  This  la  the 
mathematical  expression  of  the  statement  that  "all  classification  are  sub¬ 
jective  ,  depending  on  the  purpose  of  the  classifier."  Indeed,  it  is  the 
general  profile  classification  which  now  begina  to  look  doubtful  and  subjective 
compared  to  that  on  the  concrete  criterion  and  we  find  ourselves  asking, 

'Vhat  do  we  mean  when  we  talk  about  'the  thing  In  itself'?"  By  what  right, 
for  example,  did  we  start  by  giving  equal  weight  to  measures  on  the  k  dimen¬ 
sions  of  the  profile  in  the  rp  calculation.  The  fact  that  our  initial  concept 
of  shape  comes  from  the  physical  world  (Hewtonian,  at  that)  fools  us  in  the 
wider  contact,  for  we  are  naturally  accustomed  to  giving  equal  weight  to 
height,  length,  and  breadth  measures.  What  the  psychologist  really  has  to 
deal  with  is  a  severe  case  of  Einstein's  world,  with  dimensions  variously  and 
severely  contractible. 

The  only  firm  basis  for  a  system  of  weights  for  dimensions,  as  pointed 
out  by  Burt  (1937),  by  the  present  writer  (1946,  1957)  and  by  Kaiser  and 
Caffrey  (1965),  is  a  concept  of  a  population  or  universe  of  behavioural 
variables,  from  which  the  dimensions  derive.  A  rigourous  and  operational 
basis  for  dimension  weights  in  the  personality  realm  has  long  been  available 
in  Cattail's  personality  sphere  concept  (Cattell,  1946;  Cattail  and  Warburton, 
1967).  Employing  equal  weights  for  the  k  elements  of  the  profile  used  in  rp 
is  therefore  justified  only  if  one  has  demonstrated  that  these  dimensions 
approach  a  certain  relation  to  the  personality  sphere.  That  relation  is  that 
the  squared  loadings  of  each  factor  over  the  personality  sphere  of  variables 
sum  to  the  same  value  as  for  all  other  factors  (or,  since  in  practice  one 
must  work  with  a  random  or  a  stratified,  that  they  approach  equality  within 
the  limits  of  error).  At  the  present  stage  of  knowledge,  about  primary 
personality  factors,  it  seems  quite  unlikely  that  they  will  show  such  equality. 
Consequently,  we  shall  undertake  in  the  following  section  to  generalise  the 
coefficient  of  profile  similarity,  rp,  to  meet  the  need  for  unequal  weighting, 
as  well  as  to  add  other  needed  flexibilities. 

5.  Varieties  Within  the  Family  of  Profile  Similarity  Coefficients 

From  the  preceding  section,  it  becomes  evldant  thst  the  cormonly 
used  profile  similarity  coefficient,  r_,  is  really  one  of  a  special  case: 
one  of  many  possible  formulae  within  aPfamlly  of  coefficients.  There  could, 
for  example,  be  weights  and  polynomial  expressions  for  calculating  similarity 
(or  "distance")  with  respect  to  all  kinds  of  relations  to  criteria  and 
particular  combinations  of  criteria.  The  ordinary  rp  is  a  special  case  from 
these  in  operating  with  a  linear  combination  of  squared  differences  which 
gives  equal  weight  to  all  dimensions.  Furthermore,  its  quadratic  form 
specifically  assumes  a  non-linear,  parabolic  relation  of  individual  traits 
to  criteria.  This  means  that  in  evaluating  the  extent  to  which  an  Individual 
belongs  to  a  clinical  syndrome  "type"  (or  to  take  another  example,  his 
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“adjustment"  to  the  ideal,  a  table  profile  of  those  In  a  given  occupation),  It 

(a)  penalises  equally  for  under  and  overshooting  the  criterion  value  and 

(b)  does  so  in  tenas  of  the  square  of  the  deviation  Involved. 

Obviously,  aodlflcatlons  could  be  node  in  the  similarity  formula  to  fit 
all  kinds  of  assumptions  about  the  relation  of  trait  to  performance,  which 
could  be  expressed  in  various  polynomials  relating  profile  factor  scores  to 
the  criterion.  Indeed,  one  instance  of  modifying  the  rp  index  which  may  be 
briefly  amntloned,  because  it  is  actually  more  consistent  with  the  widely 
psychologically  used  linear  factor  specification  equation  than  is  rp,  is  what 
we  shall  distinguish  as  the  coefficient  of  linear  similarity,  rg.  In  this, 
the  signs  of  the  d's,  standard  score  differences  of  two  persons  on  the 
succession  of  factors,  are  preserved  in  the  addition  and  the  coefficient  will 
indicate  not  only  the  degree  of  similarity  of  two  persons,  but  also  which 
is  positive  (higher)  relative  to  the  other.  It  is  defined  by  where  there 
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2k)b  ->  bd 
2k)  b  +  y  bd 
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are  k  profile  elementa,  the  b's  are  the  factor  weights  and  the  d'a  the 
differences  on  the  factors,  person  p  always  being  aubracted  from  peraon  Pj. 

The  expression,  rs,  will  preserve  consistency  with  the  familiar  linear 
specification  equation,  but  the  similarities  thus  calculated  will  lose  any 
relation  to  an  Euclidean  distance.  Tet  another  member  of  the  profile  similarity 
family,  and  one  which  succeeds  in  approaching  Euclidean  distance  properties 
even  better  than  r  (see  Figure  2),  is  what  we  may  call  the  coefficient  of 
nearness,  rQ,  defined  as  follows:* 

J2k  J>  *2 

V  V  ~  (3) 

vS^fd* 


^Strictly  the  expected  value  of*/5*d2  is  \Jlk  (1  -  l/4k  +  l/32k2  +  1 

_  V  —  v  128 

+  ...  );  but  is  a  close  enough  approximation  if  k  la  not  too  small. 
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The  greater  conformity  of  rn  to  Euclidean  apace  (i.e.  ,  to  a  generalised 
0)  is  shown  in  the  graphs  of  Figure  2.  Like  all  members  of  the  r  family,  rn 
has  a  numerically  Ismmdiate  meaning  as  a  similarity  coefficient  in  that  it 
yields  0  when  the  relationship  is  an  average,  random  value;  it  reaches  +1.0 
for  exact  likeness;  and  approaches  -1.0  for  maximum  dissimilarity. 
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Hhat  recommends  it  less  than  rp  la  Chat  ita  dlatrlbutlon  skews  more,  approach¬ 
ing  -1.0  very  slowly.  At  a  5  sigma  difference  on  every  element  It  la  atlll 
only  approximately  -0.6.  Xhla  alovneaa  to  approach  -1.0  may  expreaa  a 
neceaaary  truth  namely  that  in  any  ordinary  biological  or  aoclal  population 
extreme  oppoaitea  are  ouch  more  rare  than  individuate  who  cloeely  reaeabla 
each  othera.  Thla  Inference,  aa  well  aa  certain  other  propartiea  of  rn  and 
rp  warn  ua  that  In  averaging  and  in  other  menlpulatlona  of  pattern  almllarlty 
coefficient*  we  need  to  watch  certain  pitfalla.  Since  there  haa  been  practi¬ 
cally  no  reported  experience  with  r0,  where* e  rp  haa  been  appreciably  triad 
out,  our  further  diacuaaion  will  keep  to  the  latter,  considering  the  further 
lsaues  of  weighting  and  obliquity  only  in  regard  to  the  generellzed  rq  formula. 

Published  uses  of  rp  to  date  have  used  the  specific,  non- genera  Used 
form,  which  has  two  main  assumptions:  (1)  that  the  factor  measurements  ere 
orthogonal,  and  (2)  that  the  elements  (factors)  are  to  be  given  equal  weight. 
Yet  moat  known  personality  and  ability  source  traits  stand  obliquely  to  one 
another  so  that  assigning  nominal  weights  to  items  would  not  give  equal 
statistical  weights.  And  often  we  wish  to  give  them  known  unequal  weights, 
which,  incidentally,  implies  also  that  we  are  giving  certain  weights  to  the 
higher  strata  (Cattell,  1965)  factors  arising  from  the  oblique  factors. 

Probably  it  would  be  correct  to  say  that  most  psychologists  implicitly 
assume  in  comparing  personalities  that  they  want  to  give  equal  weight  to  each 
and  every  behaviour  in  real  life,  i.e.,  to  consider  the  realm  of  criterion 
performance  as  the  basis  for  perspective.  If  so,  they  should  recognise  that 
to  achieve  this  goal  it  will  nevertheless  be  necessary  to  give  unequal  weights 
to  the  factors.  Unequal  weights  are  necessary  because  in  predicting  variables 
constituting  a  stratified  sampling  of  the  universe  of  behaviour  we  are  likely 
to  find  some  factors  more  "important"  than  others.  A  precise  expression 
(granted  an  available  defined  total  population  of  variables)  for  the  differing 
importance  of  individual  factors  can  be  obtained  from  estimates  of  the  mean 
variance  contribution  of  each  factor  across  the  population  of  variables,  i.e., 
by  the  root  average  squared  'urns  of  the  factor  loadings  for  the  given  factor 
(the  "latent  root"  in  the  orthogonal  case)  as  follows: 


where  bjx  is  the  loading  of  variable  x  on  factor  j. 

Other  rationales  for  weighting  may  be  proposed,  but  regardless  of  their 
nature  we  shall  need  a  generalized  rn  for  my  obliquity  and  any  weight.  Let 
us  begin  with  the  essential  form  behind  equation  (1),  namely. 
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where  dxy  la  ch«  squared  distance  apart  of  two  people,  x  and  y,  In  a  k 
d  las  ns  lone 1  Euclidean  space  and  I  k  la  the  expected  distance  for  k  dimensions. 
But  dzxy  can  no  longer  be  simply  Jak  ,  (or.  In  matrix  notation, 

J  2l<  J*  -  Jy) 

s'd(xy)  >d(xy) •  For  we  must  now  take  Into  account  the  correlations,  tf«fj 
between  the  source  traits  (factors)  j  and  and  others,  which  we  may  J  — 
writs  as  the  usual  matrix  Rf,  and  we  t-uat  also  Include  the  weights  assigned 
to  the  factors,  which  we  will  write  Into  the  k  by  k  diagonal  matrix  Du.  Then: 

d2  (*y)  s  >'d(xy)  d2*  Rf  °2w  *d(xy)  l6) 

Hie  expected  value  of  d2(xy)  la  no  longer  2k,  but  is: 

1  ,  „ 

s  trace  (Dl  L'  D*w  R£  D^w  L  D£)  (7) 

where  D  la  the  diagonal  matrix  of  latent  roots  of  td(xy)  z ' d /  \  <nd  L  the 

matrix  of  the  associated  lateut  roots.  ***' 

If  one  wishes  to  revert  to  the  special  case  ao  far  employed  --  the 
orthogonal,  equal  weight  rp  --  it  la  easily  done  by  inserting  r  s  0  and 
w  -  1  In  the  above.  The  computing  convenience  of  the  orthogonal  approxi¬ 
mation  wa  have  been  using  (acceptable  when  only  minor  obliquities  exist) 
la  thus  very  substantial  and  attractive;  for  the  user  of  the  oblique  formula 
la  compelled  to  work  out  afresh  for  each  case  the  complex  expreesion 
Z'(xy)  D2*  Rf  d2%>  z(xy).  To  employ  the  simple  (orthogonal)  approximation, 
on  tne  other  hand.  It  aufflcea  only  to  enter  a  nomog'aph  with  the  individual 
dz  value  (Table  In  Cattell  and  Ebcr,  1966).  However,  with  the  help  of  a 
computer  program,  based  on  (7),  the  use  of  the  exact  oblique  formula,  even 
with  quite  largo  numbers  of  individual  cases,  presents  no  real  problem. 

The  formula  for  the  profile  nearness  coefficient  --  (3)  above,  using 
d'a  without  signs,  Instead  of  d2  --  when  correspondingly  adapted  to  specific 
source  trait  obliquities  and  weightings  becomes: 

Ek  -  O2.  «f  °2»  *d(xy>>7  («> 

E"  +  <2'd(xy)  d2>'  ■*  Z<W>5 

Here,  to  a  first  approximation: 

Ek  :  (trace  (D*  L  Rf  D2w  L  dJ))*  (9) 

The  distribution  and  significance  limits  for  rn,  corresponding  to 
those  obtained  for  rp  (Horn,  1961)  remain  to  be  worked  out,  ao  the  further 
steps  and  applications  we  now  propose  to  follow  are  best  considered  to 
employ  rp. 


6.  THE  OPERATIONAL  DEFINITION  OP  PHENOMENAL  AND  NUCLEAR 
CLUSTERS  (OR  CLIQUES)  IN  AN  INCIDENCE  MATRIX 


With  the  above  treatment  of  tha  problem  of  calculating  similarity 
(a  reciprocal  function  of  diatance)  aa  such,  for  any  two  Ida,  wa  are  ready 
for  the  operations  in  finding  types.  In  tbs  first  step  from  this  similarity 
value,  be  it  rp,  rn,  D  or  any  other  consistent  concept  --  toward  classifying 
people  in  types  one  must  introduce  a  limiting  value  --  arbitrary  or  natural  •• 
in  order  to  shift  from  a  quantitative  or  parametric  to  a  qualitative  or 
categorical  treatment.  At  some  point  one  must  end  by  speaking  of  people  as 
"in"  or  "out"  of  a  type,  though  degrees  of  belonging  may  also  be  used  later. 

Although  we  must  never  lose  sight  of  the  metric  origin  ef  the  cutting 
point,  and  tha  way  In  which  its  choice  can  affect  the  grouping,  yet  we  now 
propose  to  convert  the  Q  matrix  of  rp's  into  an  "incidence  matrix".  Therein, 
if  a  certain  limiting  positive  rp  value  is  exceeded  in  tha  original  Q  matrix 
a  unity  is  entered,  to  designate  a  linkage,  whereas  if  rp  is  not  positive,  or 
is  below  this  significance  a  sero  is  entered  in  the  cell  to  show  that  the  two 
people  are  unrelated.  There  will  thus  be  no  negative  values,  but  only  0's 
and  positive  unities  in  the  incidence  matrix.  * 


"This  is  perhaps  the  place  to  point  out  that  the  reciprocity  of  R-  and  Q- 
technique  practices  breaks  down  in  one  important  respect:  one  can  meaningfully 
reflect  tests  but  not  people.  Consequently,  one  cannot  meaningfully  reflect 
rp  coefficients  signs  (to  make  them  positive)  by  reflecting  one  of  the  two 
people.  It  is  true  that  conceptually  we  may  do  so,  and  that  we  recognise  a 
special  logical  affinity  of  opposites,  as  when  we  talk  in  one  breath  of  angels 
and  devils,  and  theology  insists  that  Lucifer  had  to  be  a  fallen  angel.  But 
what  is  the  opposite  of  a  chair?  Opposites  to  exlstxjg  objects  may  be 
mathematically  conceivable,  by  logical  flat,  but  not  consistent  or  conceivable 
in  scientific  properties.  Certainly  for  most  objects  opposites  simply  do  not 
exist  in  any  actual  world  of  data.  So,  like  D'Artagnon,  we  may  assert  "Le 
dlable  est  mort"  without  becoming  atheists.'  In  short,  in  the  whole  process 
of  mapping  similars  we  are  not  required  to  consider  opposites,  and  certainly 
we  are  not  permitted  to  make  reflections  in  Q-matrix  id  entries.  Parenthe¬ 
tically,  with  correlations  of  persons,  reflecting  even  a  test  upsets  the 
inter-person  similarity  value,  as  pointed  out  by  Cattail  (1952a)  in  the  early 
discuasions  of  Q-technique,  and  illustrated  pointedly  in  a  recent  paper  by 
Howard  and  Diesenhaus  (1965). 


Once  the  abstraction  of  the  incidence  matrix  is  reached,  with  "links" 
taking  the  place  of  similarity  values,  both  the  scientific  model  and  the 
computer  program  we  are  developing  for  it  take  on  broader  reference  and 
utility.  In  most  respects  they  apply  both  to  the  personality  and  cultural 
psychologies s  (aa  well  as  the  biologist's)  need  to  find  types  and  to  the 
sociologists  need  for  an  objective  basis  for  locating  cliques  and  comminica- 
tlon  networks  (Cattell,  1963).  These  alms  formally  express  themselves  in 
finding  what  we  have  called  stats  (not  segregates).  Within  stats  themselves, 


however,  two  distinct  sub-concepts  are  now  needed:  phenomenal  clusters 
(or  stats)  and  nuolear  clusters  (or  stats).  A  phenomenal  cluster  (henceforth 
p-oluster  for  short)  ooxresponds  to  what  is  perhaps  the  simplest  operational 
definition  of  a  homos  tat  as  a  homogeneous  set  of  ids.  It  is  defined  as  a  set 
Of  ids  each  of  which  is  linked  to  every  other  (and  which  does  not  exolude  any 
other  id  similarly  linked  to  the  set).  Spatially  this  means  that  all  fall 
within  a  hypersphere  of  diameter  fixed  by  the  similarity  coefficient  level 
accepted  as  a  link.  The  word  "phenomenal"  is  used  because  such  a  cluster  is 
directly  obvious  and  given  in  the  data  relations,  whereas  a  nuolear  cluster 
(henoeforth  n-oluster)  as  wo  shall  see  in  a  moment,  has  a  less  direct  defini¬ 
tion,  because  it  requires  an  extra  operation  of  abstraction. 

Obviously  the  number  and  the  nature  of  the  p-c lusters  found  in  given 
data  will  alter  with  the  id  from  which  search  is  started  and  with  the  cutting 
point  on  rp  which  is  used  as  a  similarity  limit,  i.e.,  translates  as  a  link 
in  the  incidence  matrix.  Different  groupings  will  appear  as  the  limit  is 
dropped,  just  as  the  sand  bars  in  an  estuary  change  shape  with  the  tide, 
lame  typologists  both  in  psychology  and  Mology,  have  been  frankly  arbitrary, 
setting  some  value  from  -*0.5  to  +0.8  as  a  limit  according  to  "judgement". 

Sinoe  arbitrariness  of  this  degree  is  unsatisfactory,  two  possibilities  of 
objectivity  need  to  be  considered.  First,  one  may  shift  the  decision  to  a 
decision  on  the  number  of  types  one  expects  to  find,  which  is  the  inverse 
of  the  average  size  of  a  type,  in  terms  of  percentage  of  the  total  population 
included,  (if  cne  visualizes  a  two-space  filled  with  adjoining  circles,  now 
large,  now  small,  he  will  see  what  the  alternatives  mean.)  This  remains  on 
a  completely  arbitrary  basis,  but  it  is  one  which  can  be  referred  more  direotly 
to  the  goals  of  systematica  in  the  given  field  than  can  the  r^  value  per  se. 
Secondly,  one  can  take  a  cutting  point  dictated  by  the  distribution  of  the 
distances  in  the  ida  themselves  in  the  sample.  For  example,  in  a  sample  of 
100  a  critical  distance  might  be  chosen  such  that  most  ids  will  stand  as 
isolates.  (Or,  in  general,  most  clusters  will  contain  only  1  per  cent  of 
the  population.)  This  recognizes  the  relativity  of  types,  e.g.,  that  a 
hundred  people  shoulder  to  shoulder  counts  as  a  crowd  in  Tires  Square  or 
Ploadilly,  but  six  people  within  sight  of  one  another  indicates  a  group  if 
found  in  the  Sa.iara.  In  the  last  resort  this  encounters  the  same  arbitrary 
deoision  as  the  first  method:  "What  fraction  of  your  population  do  you  want 
to  inolude  in  types?"  However,  it  does  suggest  an  initial  objective  operation, 
namely,  to  take  as  the  cut  off  point  the  mean  of  the  positive  r-’s  in  the 
matrix,  or  to  take  the  mean  of  the  rp’s  from  random  normal  deviates  for  k 
profile  elements.  This  latter,  incidentally,  will  not  be  exactly  zero,  but 
it  will  make  roughly  half  the  links  significant.  Table  1  shows  values  thus 
generated,  to  illustrate  their  dependence  on  the  number  of  elements. 

(Insert  Table  1  here) 

Table  1  answers  the  question  sometimes  raised:  "If  we  take  n  times  as 
arfhy  people  randomly  distributed  in  the  same  spaoe  will  not  the  average 
distance  of  eaoh  person  from  every  other  be  correspondingly  reduced?" 

Here  Mahalanobis'  D  will  be  more  susoeptible  to  sampling,  but  rp  scarcely  at 
all,  as  Table  1  shows,  for  although  there  will  be  an  increase  in  the  total 
number  of  similar  people  there  will  be  a  corresponding  increase  of  those  who 
are  dissimilar,  i.e.,  mutually  correlating  0  to  -1.0.  However,  for  a  given 
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Halt  of  admission  by  r-»  to  s  hooostat  mors  people  will,  of  course,  be  included 
in  absolute  terms,  if  the  population  structure  remains  the  same,  with  a  large 
than  a  small  sample.  Sampling  laws  for  atats  and  alts  remain  to  be  woTked 
out,  but  to  a  close  approximation  multiplying  the  sample  alee  by  n  will 
multiply  the  number  in  any  given  diameter  of  stat  by  n.  Consequently  all 
type  structure  statements  should  at  some  stage  be  converted  to  percentages 
and  further  analysis  pursued  on  that  basis. 

Granted  an  agreed  critical  cutting  point  on  rp,  leading  to  a  linkage 
Q-matrlx,  by  what  systematic  operations  can  one  derive  the  p-cluaterst 
A  Boolean  algorithm  for  this  purpose  will  be  described  in  the  next  section, 
but  here  we  have  still  to  complete  the  conceptual  distinction  of  phenomenal 
and  nuclear  clusters  and  so  for  the  moment  we  shall  take  a  small  example  in 
Figure  3  in  which  the  phenomenal  cluster  is  obvious  from  Table  2.  In  fact 
three  instances  of  p-clusters  are  illustrated  topologically  in  Figure  3, 
namely,  a  b  e  f  g;  a  b  c  d  h;  a  b  c  d  e  i. 

(Insert  Figure  3  here) 

It  will  be  noticed,  however,  that  the  first  two  p-cluaters  overlap  with 
respect  to  Ida  a  and  b.  That  Is  to  say,  a  and  b  arc  linked  in  all  necessary 
ways  for  a  p-cluster  with  e,  f  and  g  on  the  one  hand  and  c,  d  and  h  on  the 
other;  but  c,  d  and  h  are  not  linked  with  f,  g  and  e.  the  term  nuclear 
clusters,  or  n-cluster  is  therefore  given  to  a,  b.  If  one  now  considers  the 
third  p-cluster  (No.  2)  in  Figure  3(1),  he  will  note  that  the  nuclear  cluster 
concept  can  get  complicated,  to  the  extent  that  "orders"  of  nuclaar  clusters 
must  be  Introduced,  according  to  the  number  of  p-cluater  overlaps  Involved. 

Thus  c  and  d  are  in  a  two  p-cluater  n-cluster,  but  a  and  b  are  sustained  by 
a  three  p-cluster  overlap.  An  n-cluster  finishes  by  being  more  than  the 
definition  of  a  simple  stat:  it  is  a  atat  with  additional  "structural" 
properties. 


(Insert  Table  2  here) 

As  instances  (1)  and  (ii)  in  Figure  3  suggest,  the  structural  varieties 
of  n-clusters  according  to  the  associated  form  of  relation  of  p-clusters  can 
be  very  diverse.  And  since  the  description  of  a  population  sample  in  terms 
of  p-clusters  alone  may  vary  (as  pointed  out,  by  our  tides  and  sandbanks 
analogy,  showing  groupings  to  alter  according  to  the  cut  off  level  on  r»), 
the  n-cluster  description  will  also  change  with  the  critical  cut-off  value. 
Consequently,  to  approach  an  adequate  description  of  a  domain  it  is  desirable 
to  present  groupings  at  each  of  several,  say,  three  standard  levels  (for 
which  experience  suggests  rp  ■  0.2,  0.5  and  0.8),  as  a  cartographer  presents 
contour  lines  only  at  standard  levels.  For  convenience  these  levels  may  need 
adjusting  to  the  parametric  properties  of  the  given  data  as  in  our  analogy 
of  the  Sahara  and  Times  Square.  On  the  other  hand,  if  certain  standard  rp 
levels  could  be  agreed  upon  in  type  research  generally,  it  would  advan¬ 
tageously  permit  comparisons  of  various  domains  for  what  in  our  Introduction 
we  briefly  called  texture.  Texture  can  now  be  given  more  specifically  the 
meaning  of  the  number  of  p-clusters,  of  various  percentage  sixes,  at  various 
cutting  levels,  plus  the  n-cluster  sixes  at  various  numbers  of  p-cluater 
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overlaps,  etc.  With  tnis  glance  at  the  number 
required,  cur  introductory  rtstemeiit  about  the 
for  a  simple,  single,  boiled-dewn  mathematical 
in  k-dimersions  will  be  more  self  evident:  we 


of  sunririzing  statements 
inapproprictencss  of  hoping 
statement  when  mapping  clouds 
are  dealing  with  topography. 


7.  THE  EOOLEAK  C1USTL1  SEARCH  ALGORITHM  FOR  FINDING  STATS 

Let  us  now  consider  the  logical  end  computational  requirements  in 
proceeding  from  a  given  incidence  matrix,  as  in  Table  2,  to  a  statement 
about  stats  (as  r.-  and  p-  clusters)  such  as  is  summarized  visually  in 
Figure  2.  It  is  this  c“rp  which  will  provide  the  basis  of  the  Taxonome 
computer  program.  I’or  f. '.tiding  clustery  in  correlation  matrices,  Cattell 
(1952)  originally  propped  the  v.jr.ifyit.g  linkage  method  algorithm,  but 
subsequent  use  showed  the  r.ocd  for  rn  additional  step,  and  we  now  call  the 
revised  method  the  3-oloar  cluster  search  method. 

It  still  begins  with  the  ratifying  lirkage  noth.!  -Mch  proceeds  from 
the  original  Q  mt  1::  (!;c;.ceio**ih  C]o  .'or  the  basic  matrix,  to  distinguish 
it  from  subsequent  derrctlvcs,  aiulogcus.y  to  V;>  Vj,  Vji,  etc.,  in  factor 
analysis).  Herein  c-a  w*rt:s  sequentially  through  the  given  links  for  one 
person  after  arc  Cher,  i.e.,  column  b*-  col  turn  or  row  by  row  in  Table  2,  at 
each  step  deleting  any  ids  net  directly  linked  to  those  found  in  the  earlier 
columns.  It  will  be  found  that  in  this  comparatively  simple  example  the 
ramifying  linkage  method  alone  lends  reliably  to  the  clurcers  shown  in 
Figure  2.  However,  for  the  sake  of  illustrating  certain  higher  derivatives 
we  shall  turn  to  a  new  but  still  small  example  presented  by  Table  3,  to 
illustrate  the  need  for  the  full  Boolean  procecs.  Beginning  with  the  incl> 
dance  matrix  emong  ten  ids  in  0o,  the  process  (and  the  subsequent  computer 
program  first  scars  column  1  end  thu3  notes  the  set  of  persons  related  to 
parson  1,  namely,  persons  5  and  7.  It  proceeds  next  to  column  5  and  notes 
that  person  5  is  related  to  person  7;  so  1,  5  and  7  form  a  cluster. 

(Insert  Table  3  here) 

Incidentally,  in  setting  up  the  Q0  matrix  a  triangular  form  is 
sufficient,  for  if  ids  i  and  j  arc  related,  then  the  (i,j)  and  the  (J,l) 
elements  of  Q0  are  1,  but  computationally  it  is  more  convenient  to  use  the 
whole  matrix,  recognizing,  however,  that  this  may  result  in  our  finding  the 
same  cluster  twice. 

From  Q0  our  aim  is  to  produce  a  matrix  C\  (for  "grouping  matrix") 
giving  an  initial  statement  of  existing  clusters  according  to  the  ramifying 
linkage  method.  As  we  encounter  each  link  in  column  1  we  must  decide  if  the 
id  (person)  concerned  is  also  linked  directly  with  other  persons  having 
links  in  that  column.  To  decide  this  we  must  see  if  for  every  entry  of 
unity  above  hi--  there  exists  a  corresponding  entry  of  unity  in  his  row 
(or  equivalently,  column)  of  Q0.  (The  method  as  originally  described  by 
Cattell  required  comparison  with  all  unit  entries  below  the  one  being 
considered,  a  logically  equivalent  procedure  though  '-lightly  less  efficient 
for  computing.)  So,  in  Q0  of  Table  3,  we  see  that  5  is  linked  to  1,  then 


going  down  th«  coluan  to  the  next  unit  entry  we  find  that  7  alto  balonga  to 
tha  group  si  net  when  we  look  along  tha  7  row  there  la  one  unity  In  colunn  5, 
l.a.,  a  link  of  tha  two  paraona  already  included  In  the  group.  Coluan  I, 
for  a  contingent  group,  la  therefore  started  In  matrix  Gj. 

Going  next  to  coluan  2  of  Q0  we  find  peraooa  2  and  6  fora  a  group,  froa 
coluan  3  that  3  end  7  fora  a  group,  and  those  are  entered  In  Gi  aa  coluana  IX 
ana  III.  Colunn  4  contains  a  single  unity  and  need  not  be  considered.  Work¬ 
ing  down  coluan  5  we  include  1  end  5,  but  on  exaainlng  person  6  we  find  a 
sorb  in  the  first  coluan  of  row  6,  so  6  does  not  belong  In  the  group  end  the 
unity  corresponding  to  person  6  In  Gi  is  changed  to  sero.  7  Is  related  to  1 
and  5,  and  so  Is  Included.'  However,  the  group  now  found  Is  Identical  to 
group  1  end  so  we  do  not  Include  It  in  Gi.  Similarly,  we  work  through  coluan 
6  to  10,  finding  In  ell  the  five  distinct  groUps  Hated  In  Table  3  aa  the 
coluana  of  Cj.® 


6.  Two  points  must  be  noted  about  the  ramifying  linkage  pathod.  Firstly, 
sons  of  the  clusters  initially  found  may  be  subsets  of  other  clusters.  This 
presents  no  problem.  Secondly,  due  to  the  sequential  nature  of  the  procedure, 
not  all  clusters  nay  initially  be  found,  at  least  Where  certain  Unusual  con¬ 
figuration  exist.  (Ihls  is  the  reason  for  the  next  step  from  the  Gi  matrix.) 
Thus  In  Table  3(a)  the  group  consisting  of  persons  5,  6  and  7  Is  not  found. 

We  do  not  Include  phenomenal  clusters  of  only  one  person,  which  correspond 
to  a  coluan  with  only  a  diagonal  eleaent  that  is  non- sero,  e.g. ,  columns  4 
and  8  of  Table  3,  Qo. 


Actually,  the  ramifying  linkage  net hod  Is  best  regarded  as  e  first 
step.  In  the  way  that  taking  out  a  first  factor  cane  to  be  regarded  as  only 
the  first  step  in  a  multiple  factor  analysis.  Indeed,  the  foraal  similarity 
to  factor  analytic  steps  Is  appreciable,  for  our  procedure  Is  to  set  down  a 
first  phenomenal  cluster  natrlx,  G},  froa  the  ramifying  linkage  "extraction" 
process,  end  make  therefroa  a  product  catrlx,  Q},  which,  subtracted  froa  Qo, 
leaves  a  first  residual,  Qg.  Thus,  step  2  In  Table  3  is: 


Ql  :  •  G’x  ,  (10) 


where  the  prime  denotes  e  transpose  end  the  period  denotes  Boolesn  tuetrix 
multiplication,  i.e.,  a  matrix  multiplication  with  erlthastlc  addition  end 
multiplication  replaced  by  logical  addition  ('or')  and  aultiplicetlon  ('and'). 

If  G}  should  contain  all  p-clusters,  then  we  must  have 


Qo 


^1 


since  a  link  (other  then  e  diagonal  one)  in  Qo  Indicates  that  two  persona 
are  related,  and  so  they  must  appear  together  In  at  least  one  phenomenal 
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cluster.  The  operation  G\  •  G ' j  a imply  determines  which  parson*  appear 
together  In  phenomenal  cluatera.  Table  3(2)  givea  Qj  for  the  exaaple. 

Zero's  In  ?l  corresponding  to  unities  in  Q0  have  been  denoted  by  x'a, 
indicating  that  in  this  caaa  not  all  phenomenal  cluaters  have  been  found. 

Now  the  new  "residual"  Incidence  matrix,  Q2,  Is  formed  from  the  x'a  of  Qi, 
plus  any  element  In  their  columns  (a)  that  was  unity  in  Q0,  and  (b)  for 
which  there  is  alao  an  x  in  lta  row  of  Q2.  Such  an  element  might  form  a 
phenomenal  cluster  with  the  x'a  and  so  neada  to  be  Included.  Table  3(3) 
givea  the  ^2  for  the  example.  Using  the  ramifying  linkage  method  we  now 
find  additional  phenomenal  clusters  --  In  this  case  one.  No.  VI  —  which 
we  include  with  those  already  found  to  form  C2* 

Then, 

Q2  :  c2  *  c'2  (li) 

and 

Q2  :  Qi 

If  all  phenomenal  clusters  have  been  found.  We  proceed  In  thla  way  until 
we  find  a  Gn  such  that  Qn  -  Qn-1,  except  possibly  for  some  diagonal 
elements.  In  the  example,  Table  3,  Q3  -  Q2  except  for  the  (4,4)  and  (8,8) 
elements,  ao  C2  contalna  all  the  phenomenal  clusters  In  Q0. 


8.  PROCEEDING  FROM  STATS  TO  AITS, 

TO  DENDROGRAMS  AND  TO  TEXTURES 

By  adding  a  simple  search  and  counting  procedure  which  will  list  the 
overlaps  among  the  p-clustera  for  the  algorithm  juat  described,  the  find¬ 
ings  up  to  this  stage  can  be  systematically  recorded,  as  briefly  Indicated 
above.  They  will  finally  appear  aa  a  print-out  of  (a)  p-clusters  and 
(b)  n-cluatera.  To  be  comprehensive  of  possibly  needed  information  these 
lists  will  In  detail  comprise: 

(1)  For  p-clustera:  (1)  a  listing  of  actual  Id  omnbers,  (11)  arranged 
In  order  of  sice  from  2  membership  upward,  (111)  attachment  of  Identifying 
numbers  to  clusters,  and  (lv)  expression  of  sice  In  percentages  of  ssme  and 
calculation  of  the  distribution  by  cluster  frequency,  as  shown  in  Table  4. 

(Insert  Table  l  hers) 

(2)  For  n-clusters:  (1)  a  listing  of  actual  Id  members,  (11)  attach¬ 
ment  of  identifying  numbers  to  cluster,  (ill)  arrsngement  In  this  case  in  a 
two-wey  table,  by  size  (expressed  as  percentage)  and  by  number  of  o-cluatera 
Involved  in  the  overlap,  (iv)  a  distribution  analysis  on  both  of  tnese.  For 
the  data  of  Table  2  thla  is  shown  In  Table  5. 


(Insert  Table  5  here) 
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To  conplste  the  general  atatenent  at  the  atat  level,  these  two  tablet 
must  be  repeated  for  whatever  number  of  cutting  levels  on  rp  one  feels  to  be 
necessary,  probably  three  as  Indicated  above. 

The  Investigator  will  want  to  know  how  far  he  can  make  inferences  from 
this  sample  result  (our  present  taxonome  program  handles  140  cases)  to  the 
population.  It  should  be  noted  that  stats  are  subject  to  the  particularity 
of  the  sample  in  two  senses,  first  the  ordinary  sampling  sense  and  secondly 
by  the  dependence  of  the  center  and  boundaries  of  the  state  upon  the  id  with 
which  one  begins.  As  to  the  former,  since  no  theoretical  mathematical 
statistical  treatment  is  yet  available  investigators  had  best  develop  esti¬ 
mates  of  standard  errors  for  sarplirp,  by  Vc*\.:e  Ci rlo  methods.  As  to  the 
latter,  which  will  become  clearer  as  we  discuss  aits,  the  problem  arises 
from  the  fact  that  the  center  and  boundary  of  a  stat  depends  upon  the  id 
with  which  we  happen  to  start  the  process. 

The  final  list  of  stats  will  escape  any  bias  from  this  source  on  the 
alternative  "cartet"  procedure,  and  it  will  do  so  in  the  id-similarity 
procedure  here  too,  because  all  possible  commencement  points  have  been 
Included.  But  it  does  so  at  the  cost  of  generating  a  possibly  bewildering 
number  of  overlapping  stats  in  the  p-cluster  list  above.  For  the  number  of 
p-clusters,  namely  (  g  )  where  x  is  the  number  encircled  at  the  given  dis¬ 
tance  diameter,  could  decidedly  exceed  the  original  number  of  ids'  Tables  A 
and  5  are  for  a  small  example:  with  one  of  moderate  sice  the  investigator 
may  well  ask  whether  the  procedure  was  intended  to  produce  data  reduction.' 

To  use  the  stat  llsta  the  investigator  will  need  to  look  at  the 
distribution  and  ask  what  fraction  of  the  population  he  wants  in  types.  He 
must  also  remember  that  a  large  cluster  really  means  a  dense  cluster,  since 
ell  p-cluster  diameters  are  the  same.  Possibly  he  will  want  to  use  the 
non-overlapping  highest  density  clusters  which  cover  at  least  60  per  cent 
of  the  population.  Or  again  he  may  want  n-clusters  simultaneously  above  a 
certain  density  (size)  and  a  certain  p-cluster  overlap  frequency.  For 
example,  by  rejecting  from  List  1  (Table  4)  all  p-clusters  with  fewer  ids 
than  are  shown  by  the  two  or  three  largest  orders,  one  would  get  just  two 
types  (dotted  circles)  in  A,  Figure  4,  and  two  or  three  at  the  heart  of  B. 

The  decision  must  depend  upon  texture,  and  here  texture  begins  to  assume  a 
definable  meaning.  It  resides  in  the  evidence  of  the  p-  and  n-clusters  in 
the  stat  list  (Tables  4  and  5)  as  to  how  people  are  distributed  between 
small  and  large  clusters,  how  much  overlap  occurs  respectively  with  smell 
and  large,  and  whether  any  hierarchical,  dendritic  structure  is  apparent. 

Let  ua  now  turn  to  locating  alts  (segregates).  We  are  bound  to 
begin  with  stats,  yet  utilizing  this  information  is  like  seeking  to  locate 
the  objects  in  a  large  picture  in  a  darkened  room  with  a  flashlight  throw¬ 
ing  only  a  small  circle  of  light.  The  circles  --  the  p-cluster  stats  — 
will  pick  up  the  object  only  piecemeal  and  a  method  will  be  necessary  to 
put  the  pieces  together. 

Consider  a  simplified  case  as  in  Figure  4,  with  people  spaced  as 
shown,  yielding  two  dense  segregates  A  and  B  on  an  otherwise  "dilute"  field. 
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Let  ua  assume  search  is  made  with  three  levels  of  r  cut  off,  namely,  40.8, 
40.5  and  40.3  corresponding,  in  two-dimensional  space,  approximately  to 
circles  of  the  sizes  shown.  The  first  will  give  practically  no  p-clusters 
in  the  field,  since  only  in  the  A  end  B  clumps  will  it  span  two  cases.  The 
lowest  cutting  point  (40.3),  on  the  other  hand  will  bring  every  one  of  the 
ids  into  one  duster  or  another,  as  illustrated  by  the  span  of  its  circle  at 
the  top  left.  If  we  followed  through  with  this,  as  we  have  with  the  middle 
value  circles  (0.5),  the  whole  apace  would  be  covered  with  circles  repre- 
sent inf  p-clusters,  though  the  n-clusters  would  only  appear  where  the  A  and 
B  segretates  stand. 


(Insert  Figire  4  here) 

At  this  point  the  question  might  be  raised  whether  an  n-cluster  is 
not  conceptually  equivalent  to  an  alt,  but  the  answer  must  be  no.  For  if 
an  n-cluster  is  confined  to  what  is  comoon  to  p-clustera  of  a  certain  size 
it  cannot  itself  exceed  that  size  --  and  cn  extended  alt  will  conmonly 
need  to  do  ao.  Nevertheless,  and  incidentally,  one  aeea  many  instances  in 
the  literature  where  investigators  have  adopted  stat  search  procedures 
despite  their  conceptualization  of  their  problem  clearly  indicating  that 
they  are  looking  for  aits.  It  will  help  to  clarify  this  point  to  observe 
that  in  Figure  4  the  alts  are  the  masses  A  ar.d  B.  In  this  case  it  happens 
that  by  confining  oneself  to  the  larger  state,  i.e.,  those  at  the  top  of 
Tables  4  and  5,  one  finds  in  this  case  the  heart  of  these  two  segregated 
messes.  But  it  will  not  always  be  so,  as  a  glance  at  a  chain,  as  in 
Figure  3(11)  will  remind  ua.  There  the  nuclear  clusters  are  not  central. 

It  must  also  be  remembered  that  a  larger  number  of  people  collected  in  a 
stat  by  tha  above  operations  is  not  an  indication  that  it  ia  large  (in  the 
sense  of  covering  large  areas  of  behavior)  but  only  that  people  are  very 
dense  in  the  given  region  —  which  is  possibly  quite  small.  Always  it 
must  be  borne  in  mind  that  in  a  very  extended  alt  the  last  members  may  hava 
negligible,  zero  or  even  negative  resemblance  to  the  first.  For  example, 
it  might  be  said  of  a  certain  religious  group  X  that  it  has  a  tremendous 
range  of  values  and  practices,  so  that  despite  continuity  and  coherence  in 
the  chain  of  resemblance  of  members  an  extreme  X  may  be  more  like  a  member 
of  another  religion,  Y,  than  like  members  at  the  other  wing  of  his  own 
religion.  This  statement  is  illustrated  by  B3  and  Cj  members  being,  in 
Figure  1,  in  the  same  stat,  No.  2,  but  in  different  aits.  Despite  this 
lack  of  homogeneity  present  in  the  stat  the  recognition  of  aits  is  Important 
in  many  aspects  of  social,  educational  and  clinical  psychology. 

The  operation  we  have  devised  for  objectively  locating  segregates 
consists  of  first  finding  stats  and  then  setting  up  a  stat  contiguity  matrix, 
very  similar  to  the  Q  matrix  of  linkage  among  ids,  except  that  it  now 
represents  linkage  (Interpreted  as  a  sufficient  degree  of  overlap)  among 
p-clustera.  Before  this  Qc  (relations  among  clusters)  matrix  can  be  set 
up,  one  must  settle,  from  the  evidence  on  the  general  texture  of  the 
domain  given  by  the  equivalents  of  Tables  4  and  5  above,  on:  (1)  the 
cutting  limit  of  rp;  (2)  the  densities  (numbers  of  ids  in  a  stat)  to  be 
accepted  (clusters  of  only  2  and  3  persons  would  normally  be  rejected  as 
too  unstable);  and  (3)  the  amount  of  overlap  to  be  accepted  as  evidence 
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of  linkage  (one  id  might  be  too  subject  to  sampling  variation;  an  overlap  of 
2,  3  or  more  seems  more  appropriate). 

The  taxonome  program  as  now  set  up  accepts  its  instructions  on  these 
limits  from  values  Inserted  for  the  particular  problem  by  the  experimenter 
end  then  presents  a  Qc  incidence  matrix  among  p-clusters.  But  from  that 
point  on,  the  search  made  in  Qc  is  quite  different  from  the  Boolean  Cluster 
Search  Algorithm  used  in  Q0  for  finding  stats.  Now  we  are  no  longer  interested 
in  maintaining  the  condition  that  every  member  (a  member  now  being  Itself  a 
cluster)  shall  be  linked  with  every  other.  Instead  we  are  Interested  in 
segregating  all  the  Ida  (clusters)  which  are  continuously  connected  with  one 
another  through  any  Intermediate  ids  maintaining  the  stipulated  degree  of 
resemblance.  The  procedure  now  requires  that  we  go  down  a  column  of  Q.,  find 
the  other  ids  (in  this  case  clusters)  linked  to  it,  and  then  puraue  all  its 
connections,  and  so  on  for  further  additions  to  the  family.  Thus  even  the 
shape  of  an  octopus  would  be  recognized  by  this  procedure,  provided  the 
tentacles  at  no  point  get  ao  thin  as  to  preclude  visible  overlap  —  by  the 
stat  size  which  means  "visibility".  This  we  nay  call  continuous  connectedness 
analysis.  The  further  Issues  of  texture  tactics  and  boundaries  presented  by 
such  problems  as  this  last  will  be  discussed  in  a  moment,  but  flr6t  the  main 
"Segregate  Search"  procedure  will  be  described. 

(Insert  Table  6  here) 

Again  the  program  employs  Boolean  algebra  concepts.  The  investigator 
(or,  in  our  program,  the  computer)  proceeds  systematically  from  column  1  down 
the  other  columns  of  an  incidence  matrix,  Qc.  This  is  derived  from  the  data 
of  the  earlier  (individual  person)  example,  summarized  in  Tables  4  and  5,  via 
a  pre-incidence  matrix,  (a),  in  Table  6,  which  gives  the  numbers  involved  in 
the  cluster  overlaps.  Proceeding  down  the  first  column  of  the  Qc  matrix  one 
accrues  the  Ida  in  the  rows  corresponding  to  the  incidence  signs.  At  each 
such  id  one  runs  across  the  row  and  accumulates  new  columns  where  incidence 
signs  occur,  following  these  likewise  across  rows  which  are  not  null.  Thus 
in  Table  6(b),  columns  1,  2  and  3  begin  to  form  a  segregate  but  the  inter¬ 
section  of  this  with  4,  5  and  6  is  null.  Starting  again  with  4  one  finishes 
with  4,  5  and  6.  Illustrated  in  Boolean  terms,  if  the  columna  were  as  in 
(a)  the  Boolean  product  would  be  zero,  and  we  should  proceed  no  further. 

In  (b)  on  the  other  hand,  it  is  not  null,  so  we  proceed  to  Boolean  addition 
to  form  the  new  segregate,  shown  in  the  last  column  of  Table  7. 

(Insert  Table  7  here) 

Obviously,  the  detail  in  the  picture  of  segregates  will,  as  in  a  photo¬ 
graph  depend  on  the  size  of  the  grain.  A  glance  at  Figure  5  will  show  that 
if  the  amalleat  circle  (rp  -  0.8)  were  used  the  isthmus  between  the  two  parts 
of  the  dunbell  shaped  A  segregate  would  not  appear;  chough,  on  the  other  hand, 
a  gain  would  result  from  certain  fringe  persons  around  A  and  B  being  dropped 
who  perhaps  could  be  said  not  really  to  belong. 

It  may  be  asked  why  the  search  for  aits  is  not  carried  out  by  applying 
waht  we  have  called  the  continuous  connectedness  analysis  (Table  7)  directly 
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to  resemblances  of  Individuals  in  th«  manner  that  the  Boolean  Cluster  Search 
has  been  used  for  matrix  Q0  (Table  3),  or  Table  2.  Our  answer  is  that  a 
single  Individual  is  altogether  too  slender  a  datum,  in  view  of  sampling  error, 
upon  which  to  rest  connectedness.  Thus,  at  the  cluster  search  stage  the 
elimination  of  smaller,  e.g.,  two-man,  clusters  from  List  1  (Table  4)  is 
likely  to  take  care  of  sampling  error  "artefacts"  in  the  original  data, 
whereas  it  would  be  difficult  to  eliminate  one  man  threads  in  the  continuous 
connectedness  analysis.  Accordingly  it  has  seemed  better  to  locate  stats  and 
then  use  these  as  units  in  recognising  the  continuous  connectedness  sought 
in  alts.  However,  more  could  be  said,  and  certainly  more  needs  to  be  done  in 
the  way  of  experiment  upon  the  effects  of  adjusting  the  sire  of  stat  diameter 
to  the  texture  of  the  domain,  when  seeking  alts. 


9.  TRIAL  OP  IAX0N0ME  ON  REAL  DATA  AND  PLASHES 

A  description  of  the  technical  flow  chart  of  the  computer  program  built 
by  us  on  the  above  principles  is  set  out  elsewhere  (Cattell  and  Coulter. 

This  Journal,  p.  ).  It  is  to  be  hoped  that  others,  in  experimenting  with 
its  use,  will  develop  ways  of  finding  the  best  parameters  in  the  program 
suitable  for  various  textures  and  kinds  of  data.  Here  we  report  only  on  two 
sufficiently  diverse  practical  examples  to  show  that  Taxonome  wor*.a  to  a 
reasonable  degree.  A  trial  of  the  algorithm,  but  by  desk  computer,  was  made 
by  Cattell  (1950)  soon  after  devising  rp,  on  an  example  of  general  Interest, 
namely,  the  classifying  of  national  culture  patterns  into  types  of  "civili- 
catlons,"  to  check  on  Toynbee's  speculations.  Using  a  twelve  factor  profile 
for  each  of  69  countries  Cattell  obtained  some  ten  phenomenal  clusters 
centering  on  two  nuclear  clusters.  Pour  of  the  former  are  set  out  in  Table  8 
for  Illustration. 


(Insert  Table  8  here) 

It  will  be  seen  that  these  blindly  statistically  obtained  stats  make 
sense  in  terms  of  the  usual  soclo-historlco-anthropological  evaluations. 

Thus  encoursged,  tie  proceeded  (albeit  with  too  many  interruptions)  to  the 
present  taxonome,  which  is  now  being  tried  by  us  on  a  number  of  plasmodes. 
(Plasmodes  hrve  been  defined  (Cattell,  1966)  as  arrangements  of  specific 
numerical  values  to  fit  a  mathematico* theoretical  model.  They  are  useful 
for  gaining  new  insights  into  the  working  of  a  model  and  for  trying  out 
computer  programs  intended  to  analyse  data  according  to  such  a  model.) 

While  waiting  to  complete  studies  on  strategically  chosen  plasmodes  we 
decided  to  try  a  nursery  model,  using  as  data  29  vessels  from  "Jane's 
Plghtlng  Ships"  (1964-65)  representing  four  distinct  types  of  craft  -- 
aircraft  carriers  (5),  destroyers  (4),  submarines  (10)  and  frigates  (10). 
Twelve  messures  were  used  in  the  profile  of  each,  for  the  rp  calculations: 

(1)  displacement;  (2)  length;  (3)  beam;  armament  in  number  of,  (4)  light, 

(5)  medium,  (6)  heavy  and  (7)  very  heavy  guns,  (8)  the  complement, 

(9)  maximum  speed,  (10)  aubmersibillty,  (11)  continuity  of  deck  construction, 
and  (12)  whether  no,  some  or  many  aircraft  were  carried. 


(Insert  Table  9  here) 
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The  incidence  matrix  (Table  9(a))  auggeata  to  the  eye  that  the  break¬ 
down  into  four  claaaea  will  be  reaaonahly  good,  but  the  actual  p-cluatar 
output  (Table  9(b))  indicator  9  cluaters.  Three  of  theee  are  clearly  the 
deotroyera,  eubaarlnea  and  frigatea,  but  the  aircraft  carriera  have  broken 
into  3  p-cluatera  which,  later,  however,  yield  a  a  ingle*  cue leer  cluatar. 

Further,  wore  complete  application,  which  cannot  be  deecrlbed  In  thia 
Introductory  paper,  are  being  reported  elaewhere. 
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10 1  SUM1ARY 

(1)  The  ooft  useful  general  concept  of  a  type  requires  that  It  be 
defined  as  the  central  profile  In  a  high*  "nodal"  frequency  (unusually  high 
density)  of  individuals  in  a  multi-dimensional  distribution. 

(2)  Two  sub-concepts  can  be  operationally  distinguished  within  the 
notion  of  type  so  defined:  (a)  the  stat  (for  homostat)  —  a  homogeneous 
group  in  which  each  member  stands  at  leas  than  a  given  distance  (the  same 
for  all)  from  all  other  members,  and  (b)  the  ait  O'er  segregate)  —  a 
continuous  but  not  homogeneous  group  in  which  each  meTJ  or  is  nearer  to  at 
least  one  other  member  then  he  is  to  ids  outside  the  group. 

(3)  State  (homostats)  and  alts  (segregates)  csn  be  found  by  either 
"inter-id  relation"  or  "density  in  space"  (cartet  count)  methods,  the  former 
being  pursued  here.  This  requires  a  measure  of  similarity  (the  opposite  of 
distance  apart  in  the  given  space)  for  every  pair  of  ids  (i.e.,  persons, 
groups,  processes,  etc.).  Reasons  are  given  for  preferring  as  a  similarity 
index  the  family  of  profile  similarity  coefficients  (r_,  rn,  r8,  etc.)  to  the 
correlation  coefficient,  Mahalanobia'  D,  or  other  coefficients  sometimes 
proposed  for  this  purpose. 

(4)  Similarity  can  be  considered  either  in  regard  to  (a)  some  specific 
criterion  performance  or  averaged  group  of  performances.  This  leads  to 
classification  of  ids  by  their  effects  or  works,  or  to  (b)  general  purpose 
dimensions,  resting  on  the  concept  of  sampling  a  personality  sphere  or  a 
population  of  variables.  This  implies  classification  according  to  the 
"thing  in  itself". 

(5)  In  the  last  resort  these  need  the  same  mathematical  treatment,  since 
even  the  "thing  in  itself"  concept  implies  some  weighting  in  the  personality 
sphere.  Formulae  are  presented  for  inter-id  similarity  indices  based  on  the 
principal  useful  alternative  assumptions,  e.g.,  regarding  linear  and  parabolic 
relations  to  criteria,  and  generalizing  the  original  profile  similarity 
coefficient  rp  to  any  correlations  among  profile  elements  and  any  weights. 

(6)  The  discovery  of  stats  begins  with  a  Q-matrix  of  rp's  among  ids. 

At  each  of  two  or  three  cutting  points  for  rp  this  is  converted  to  an 
incidence  matrix.  A  Boolean  algorithm,  based  or.  what  was  called  the 
"ramifying  linkage  method",  objectively  sorts  the  data  into  phenomenal 
clusters.  An  operational  distinction  has  to  be  made  between  phenomenal  (p») 
clusters  snd  nuclear  (n-)  clusters  which  have  quite  different  properties. 

The  conclusion  of  the  search  for  stats  consists  of  one  list  of  phenomenal 
clusters,  by  size  and  specific  members,  and  one  list  of  nuclear  clusters, 

by  size,  number  of  overlapping  clusters  Involved,  and  specific  members. 

These  lists,  which  give  the  "texture"  of  the  domain,  can  be  voluminous  and 
require  that  the  Investigator  select  an  importance  level  to  reduce  the 
number  of  concepts  to  be  handled. 

(7)  The  discovery  of  aits  (segregates)  begins  with  a  Qc  matrix  of  over¬ 
lap  among  phenomenal  clusters  which  is  converted  to  an  jcidence  "contiguity" 
matrix  and  operated  upon  by  a  Boolean  analysis  for  continuous  connections. 
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Experiment  it  needed  to  find  the  beet  rules  for  size  of  stats  to  be  used 
in  seeking  sits. 

(8)  The  concepts  snd  principles  of  snslysis  have  been  Incorporated 
in  a  computer  program  (for  the  IBM  7094  initially)  which  has  bean  shown  to 
work  on  two  concrete  examples,  though  experimsnt  on  others,  adjusting  the 
parameters  optimally,  especially  to  minimize  sampling  effects,  remains  to 
be  done.  Unless  a  theoretical  mathematical-statistical  solution  is  soon 
found,  Monte  Carlo  methods  should  be  employed  to  establish  sample  Inference 
limits  in  this  field. 

(9)  Over  and  above  the  finding  of  particular  stats  and  alts  a  search 
for  types  the  taxonome  method  alms  to  describe  the  texture  of  a  domain.  We 
have  referred  to  texture  by  the  analogy  of  the  meteorologist's  use  of  cumulus, 
alto-stratus,  etc.,  to  describe  cloud  formations.  Segregates  can  appaar  aa 
small  or  large,  even  or  unevenly  spaced,  massed  or  in  chains,  etc.  Opera* 
tionally,  texture  will  broadly  be  defined  by  comparisons  of  structure  at 
different  cutting  levels,  by  the  ratio  of  nuclear  to  phenomenal  clusters,  by 
the  degree  of  compactness?  of  aits,  and  by  the  amount  of  hierarchical  structure 


An  index  of  compactness  can  be  obtained  by  dividing  the  total  number  of  ties 
(Incidence  matrix)  Involved  in  a  segregate  by  the  total  number  possible  — 
nc2  where  n  is  the  number  of  ids  involved  in  the  segregate. 


discernible  among  them,  as  in  the  biologists'  dendrograms.  Tha  ascertaining 
of  the  last  has  not  been  described  in  detail,  but  claarly  involves  a  "second- 
stratum"  repetition  of  the  type  search  carried  out  upon  the  patterns  repre¬ 
senting  the  central  tendencies  in  the  type  groupings  first  found. 

(10)  The  empirical  search  for  types  will  naturally  need  to  proceed 
hand  in  hand  with  inductive  and  deductive  theory  development  on  the  origins, 
interactions  and  natural  history  of  types.  A  theory  of  three  sources  of 
type  structures  is  stated  and  one  of  them  suggests  that  the  use  of  type 
concepts  in  psychology  is  likely  to  become  tied  to  the  development  of  non¬ 
linear  specification  equations. 

The  writers  gratefully  acknowledge  that  this  investigation  was  supported 
in  part  by  Public  Health  Service  Research  Grant  No.  MH  1733-09.  They  are 
indebted  also  to  Professor  Peter  Schoenemann  for  help  and  advice  on  the  early 
stages  of  the  program. 
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Table  1.  Valuta  froa  Distribution  of  Random  rp'a  Obtained  by 
Honte  Carlo  Nathoda 

(Using  normal  distribution  on  each  element  of  profile.) 


Algebraic  Mean 


N/k 

2 

6 

10 

25 

.188 

.135 

.052 

SO 

.  174 

.047 

.011 

75 

.114 

.046 

.017 

100 

.100 

.031 

.011 

Mean  of 

Positive  Values  Only 

N/k 

2 

6 

10 

25 

.501 

.307 

.223 

50 

.479 

.263 

.197 

75 

.450 

.263 

.202 

100 

.449 

.257 

.199 

Table  2.  Incidence  Matrix  for  15  People 


a 

b 

c 

d 

e 

f 

8 

h 

1 

J 

k 

1 

m 

n 

0 

a 

1 

1 

1 

1 

1 

1 

1 

1 

1 

b 

1 

1 

1 

1 

1 

1 

1 

1 

1 

c 

1 

1 

1 

1 

1 

1 

d 

1 

1 

1 

1 

1 

1 

e 

1 

1 

1 

1 

1 

f 

1 

1 

1 

1 

1 

8 

1 

1 

1 

1 

1 

h 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

J 

1 

1 

1 

k 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

a 

1 

1 

1 

n 

1 

1 

1 

1 

1 

0 

1 

1 

l 

1 

1 

5.30 


Table  3.  Proceaa  Sequences  in  the  Boolean  A  Igor  It  ha  for 
Phenoaenal  Clue ter  Search 


123456789  10 

1  1  0  0  0  1  0  1  0  0  0 
20100010000 
30010001000 
40001000000 
(1)  5  1  0  0  0  1  1  1  0  0  0 

60100111000 
7  10  10  1110  10 
80000000100 
90000001011 
10  0000000011 

<*o 

<•> 


I  II  Ill  IV  V  12345 

110000  110000 

201000  II  01000 

300100  III  00100 

400000  IV  00000 

510000  V00000 

6  0  1  0  0  0 

7  10  110 

8  0  0  0  0  0 

9  0  0  0  1  1 

10  0  0  0  0  l 

Gj  •  G 


6  789  10  123456789  10 

0  1  0  0  0  11000101000 

1  0  0  0  0  20100010000 

0  1  0  0  0  30010001000 

0  1  0  1  0  4000x000000 

0  0  0  1  1  510001x1000 
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Table  3  Continued 
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1  2  3 
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2  0  0  0 
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4  0  0  0 
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Table  4.  Phenomenal  Cluatara  Discovered 
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Table  5.  Account  of  Nuclear  Clusters 
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Tabic  6.  Finding  Segregates  by  the  Continuous  Connectedness  Algorithm 


(s)  Phenomenal  Cluster  Contiguity  Matrix,  Qc 

Phenomenal 

Cluster 

Identifying  1  2  3  A  5  6 

Numbers  (6)  (5)  (5)  (A)  (3)  (3) 


1  (6) 

2  (5) 

3  (5) 
A  (A) 

5  (3) 

6  (3) 


6  3  A 
3  5  3 
A  3  5 
0  0  0 
0  0  0 
0  0  0 


0  0  0 

0  0  0 

0  0  0 

A  2  2 

2  3  0 

2  0  3 


Entries  state  the  count  of  overlap  of  persons. 


(b)  Incidence  Matrix  among  Phenomenal  Clusters 
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Xdnetlfylng 

Numbers 
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1  2  3  A  5  6 
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3 
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5 
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1  1  1 
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10  1 


Converted  to  Incidence  Matrix  for  2  overlap  and  above. 


(c)  Segregates  Discovered  by  Segregate  Search  Algorithm  Applied  to  (b). 
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Table  7 


Boolean  Algorithm  for  Continuous  Connectedness  Search 
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Table  8.  Nuclear  Types  Pound  Among  Nations  by  Culture 
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Tabic  9.  p-Cluatcr  Starch  Stage  of  Taxonoa*  Illuatrated  on  Jane's  Fighting  Shipa 
1  2  3  4  5  ,  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29 


Garrlara  Destroy.  Submarines 
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1  1 
1  1 
1  1 
1  1 
1  1 


1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 


1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 


1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 


1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 
1  1 


5.36 


\ 

1 


Table  9  Continued 


p-Cluetera  efter  One  Cycle 
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i’cwparntlvo  Cluster  Analysis  of  Variables  and  In-il  viduain  ' 
Holtingei*  Abilities  and  the  MMl’I* 


Robert  C„  Tryon 

Hnlvareity  of  California,  Berkeley 


The  first  objective  in  comparative  cluster  analysis  la  to 
des-nbe  the  similarity  of  the  d' mansions  discovered  In  different 
groups  Thin  problem  is  known  an  the  comparative  dimensional 
eat  lysis  of  variables  or  "factor 'matching" *  In  the  domain  of 
the  intellectual  abilities,  for  -*xnap ler  one  may  discovoT*  in  « 
middle* 'lass  auburban  group  of  children  that  the  2k  Holzloger 
teet.j  of  diverge  specific  abilities  (Holringer  aod  Swineford, 

1939)  can  be  accounted  for  by  four  "besio"  general  abilities  or 
factors,,  Verbal,  Space  (Form),  Speed  and  Memory,  symbol  ized  an 
V,  S,  K  and  McAre  these  dimensions  identical  with  those  found  in 
a  lower-class  school  of  children  of  factory  workers?  An  fftlPl 
example;  Are  the  seven  general  dimensions  of  Introversion,  Body 
Suspicion,  Tension,  Depression,  Resentment  end  Autism  found  In  a 
group  of  psychtatrlc  patients  the  seme  ones  discovered  in  a 
group  cf  normals? 

Tills  problem  has  a  dlract ,  simple  solution  whan  tpproached 
by  the  logic  and  procedures  of  cluster  analysis  based  upon  domain 
sampling  principles  and  Incorpor  »ted  procedural  ly  in  the  BC  THY 
Computer  i>yatem  of  cluster  and  factor  rnalyaia  (Tryon  and  Bailey, 
ldh5)  Since  dimensional  analysis  requiree  as  haul.-  data  the 
1  riLercofralatlona  between  the  va'iabLe.'i  in  the  groups,  y.u  might 
reasonably  usk  this  question:  How  can  one  compute  the  correlations 
between  variables  of  d ifferent  groups  of  subjects?  The  answer  Is 


' ii»t  in  comparative  dimensional  analysis  all  that  la  needed  are 
•he  lector  cocl'i'j cionta  of  the  dimensions  within  each  group 
1  these  ool ng  referred  to  in  factor  analysis  aa  the  "rotated 
oblique  factor  coefficients" ) .  These  factorial  findings  within 
the  different  groups  are  directly  compared  across  the  groups  by 
the  comparative  cluster  analysis  programs  called  C0MP1  and  C0MP2 
o]  the  BC  THY  Liyutenic 

The  ceoond  general  objective  is  that  of  comparing  the 
typologies  of  two  or  more  groups  of  individuals «.  When,  for 
example,  we  score  the  Factory  and  the  Suburban  subjects  on  the 
rour  general  abilities,  V,  S,  F  and  M,  we  can  objectively  sort 
the  cnildren  in  each  group  into  different  types  based  upon  the 
patterns  of  their  scores  on  V,  F,  S,  and  M.  These  person-clusters 
(or  profile  types '  in  the  two  groups  can  differ  in  two  ways.  First, 
even  though  the  same  kinds  of  profile  types  may  appear  in  the  two 
groups  those  that  occur  with  high  frequency  in  one  group  may  be 
rare  in  the  other  group,,  We  may  refer  to  this  type  of  typological 
similarity  across  groups  as  the  similarity  of  their "frequency- 
patterna,,on  a  common  typology.  Second,  the  kinds  of  types  in  the 
two  groupa  may  be  different;  those  that  compose  one  group  may  not 
match  the  types  of  the  other  group „  In  the  BC  TRY  System,  the 
programs  expressly  designed  to  perform  the  cotqparative  typology 
of  groups  are  the  components  OTYPE,  OSTAT  and  EUCO. 

The  plan  of  this  paper  is  as  follows:  The  comparison  of  the 
dimensions  of  different  groups  (COMP)  and  of  their  typologies  (OCOMP) 
will  first  be  made  for  the  case  of  the  Holzinger  study  of  the 
abilities  of  two  groups,  the  Faotory  and  the  Suburban  children. 

Under  exactly  che  same  format  of  analysis  you  will  then  find  the 
COMP  and  OCOMP  analysis  of  the  Patient  and  the  Normal  groups  in 
a  study  of  MMPI  item-clusters.  Our  interest  in  these  two  studies 
Is  as  much  subatantative  as  procedural,  because  they  refer  to  two 
important  problems  in  cognitive  and  personality  psychology. 
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The  Study  of  Abilities:  The  Holzloger  Problem 

Comparative  dimensional  ana lysis  ("matching  factors"  or  COUP  analysis) 

The  k'u  vartabl  a;n  «•■ -ir.  the  ilol  zinger  probl  en,  101  grade  school 
cM  1  irer.  wire  ^  ven  2U  separate  tests  of  specific  abilities,  Theso 
fceits  a^e  listed  in  Tablo^l  where  you  will  note  that  they  ore 
grouped  under  the  five  dome ■;  nr.  of  Spatial,  Verbal,  Speed4  Memory 
sod  Mathematical,  Moat  of  these  tests  may  be  recognised  as  forma 
that  ?tre  Imludod  today  in  test  batteries  of  "Intell igence" ,  such 
as,  for  example  the  WISC  anu  WAIS  batterieo  from  which  the  Verbal. 
Performance  and  Pull  Scale  IQs  are  determined  (Anaotasi,  1961. 

Chapter  12). 

The  groups „ »«Tha  total  group  of  children,  here  called  the 
T  no  las  ive  group,  were  children  from  two  Chicago  grade  schorls 
The  authors  (Holzinger  and  Swineford,  1939,  p^6)  describe  them 
as  follows'.  "The  children  in  the  Pasteur  School  came  largely  from 
the  homes  of  workers  in  near-by  factories,.  Many  of  the  parents 
were  foreign-born, using  their  native  language  at  home,,  .  Both 
parents  were  American -born  in  29  per  cent  of  the  coses,  while  in 
I4.8  per  cent,  both  were  foreign-born.1'  The  second  school  was  the 
Grant-White  school  in  the  suburb  of  Forest  Park.  Ill  In  this 
group  "c«.,both  parents  were  American-born  in  72  per  cent  of  the 
cases  while  both  were  foreign- born  in  only  15  per  cent  Almost. 

100  per  cent  of  the  children  were  born  in  the  suburb  in  which  the 
school  was  located." 

Thu  Inclusive  Oroup  can  therefore  bo  thougot  of  as  being 
composed  of  two  ecological  groups.  The  l5t>  from  the  Pasteur 
School  will  be  here  ailed  the  Factory  Children  the  1 45  from  the 
Grant-White  School,,  the  Suburban  Children..  The  data  from  thi  3 
last  Suburban  group  have  been  made  famous  as  a  basic  dats-sot  in 
factor  analysis  history,  being  known  as  "The  Holzinge*  trman 
Problem"  (Harman,  I960).  Tne  Inclusive  Oroup  has  other  subgroup 
structures,  notable  sex  groups  end  grade  greupe  Furthermore, 
the  Suburban  Childron  were  organized  into  two  types  of  classrooms, 
"homogeneous  groups"  and  random  classes. 
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b.  OIj. 

Dimensional  analysis  of  the  2 4  variables  in  the  Inclusive 
Group 0 - »A  direct  comparison  of  tie  dimensions  of  the  24  variables 
in  the  Factory  and  iiuburban  Children  and  of  their  separate 
typological  structures  can  only  be  made  when  the  definera  of  their 
dimensions  are  the  same.  The  first  objective,  therefore,  13  to 
decide  on  the  number  of  dimensions  on  which  the  subgroup  comparisons 
are  to  be  made,  and  on  a  common  set  of  definer9  of  each  dimension. 

A  full-cycle  key  cluster  analysis  of  the  21*  variables  (Tryon  & 
Bailey,  1965,  Table  1,  Section  B)  was  performed  on  the  Inclusive 
group  from  which  it  was  discovered  that  after  four  dimensions 
were  extracted  from  the  intercarrel'' ‘■ions  among  the  24  teats, 
their  residuals  were  trivial.  Many  different  varieties  of  factor 
analysis  have  been  performed  on  the  correlations  of  the  Suburban 
Children,  all  of  which  also  find  four  salient  dimensions  (Harman, 
I960' , 
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The  defining  variables  of  each  of  the  four  dimensions  arc 
shown  by  superscripts  attached  to  the  names  of  the  variables  in 
Table*- 1#  Thus,,  under  the  Spatial  category  all  four  of  the  spatial 
testa  are  marked  with  super  "f",  indicating  that  each  is  a  definer 
of  one  dimension,  the  F  dimension,  measuring  form  (or  space) 
perception.  Analagoucly „  four  "v"  tests  define  the  V  or  Verbal, 
four  "s"  the  S  or  Speed,  and  five  "m"  the  M  or  Memory  dimensions , 

No  fifth  dimension  was  required  for  the  mathematics  tests. 

Dimensions  V,  3,  F,  and  M  ore  thus  designated  as  the  "basic” 
general  dimensions  of  the  24  abilities,  on  which  the  comparative 
dimensional  and  typological  analyses  of  subgroups  are  performed. 
Details  on  the  dimensional  analysis  of  the  Inclusive  group  are 
not  given  here  for  two  reasons:  They  have  been  recently  published 
elsewhere  (Tryon,  1966b),  and  they  are  so  similar  to  those  of  the 
Factory  Children  which  are  given  below  (see  Fig.^-l,  bottom)  that 
no  useful  purpose  is  served  by  presenting  them. 

Dimensional  analysis  of  the  24  variables  in  the  Factory 
Children^ --To  discover  the  cluster  structure  of  the  tests  in  the 
Factory  Children,  a  full-cycle  key-cluster  solution  of  this 
group’s  intercorrelations  among  the  24  tests  was  "preset"  on  the 
definers  of  the  four  basic  dimensions  found  in  the  Inclusive  Group. 
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6.0  ^ 

The  results  are  shown  plctorlally  In  P'g»kl,  the  bottom  spher.i.ca? 
piot*  which  Is  »  direct  tiering  “f  the  printout  of  the  diagram  in 
program  3 FAN  (Spherical  AUalyels )  of  the  BC  THY  system.  The 
surface  separation  of  any  two  testa  on  this  sphere*  is  a  function 
of  the  correlation  between  them  ( technically ,  ol‘  their  'inter- 
domaln",  or  "common-!’ actor  corra'.akioo")1  Two  test 3  that  corre¬ 
late  unity  nave  superimposed  points,  two  that  correlate  zero  ore 
90°  apart,  represented  in  ?ig,**l  by  the  distances  between  the 
three  boxes  that  fom  the  spherical  triangle;  the  bojics  represent 
the  subset  of  tfiree  independent  dineneions  derived  by  "actoring 
on  residua  la. 

Note  in  Plg.kl  that  the  five  Verbal  teste  cluster  tightly 
together  at  lower  left  in  the  conf iguration,  the  four  Speed  tents 
more  loosely  at  lower  right,  the  four  Form  t«oc?  at  the  too.  Tbe 
six  Komory  -eats  are  marked  by  "X",  denoting  thr.t  thay  a  1 
prcjt-ct  lcct>  a  fourth  dimension  which  cannot  be  s.iown  since 
it  projectr  at  right,  angles  to  the  three  depicted  in  Fig.^l.  Nctet 
however,  that  the  five  mathematical  tests  arc  depicted  in  .heoe 
three  dimenoionr- ,  and  that  they  are  all  ’’dependent’’  or.  V5  S,  P  and 
M  In  the  sense  of  being  predictable  from  the  four  a  point  prove  ’. 
In  a  recent  pap  sr  (Tryon,  196?M . 

For  reuders  in  whom  tne  thought  may  lurk  that  this  c  ear 
cluster  structure  is  due  to  "pro letting’’  on  the  definers  of  the 
Inrlusivs  group,  it  is  regrottable  that  space  doe?  not  permi  ‘ 
showing  the  configuration  recovered  by  a  purely  blind  ercpii’ical 
key-cluster  factoring  of  the  Factory  correlation  matrix.  To  do 
so  would,  nowever  be  redundant  because  the  empirically -derived 
configuration  differs  only  trivially  from  that  shown  *  a  this 
preset  solution.  The  same  configuration  also  results  from  an 
orthodox  prl  no  1  pal-axe?  eolation  plus  v-cri  rasx  or  duart  i  i^ax 
rotation,  also  available  in  the  BC  TRY  System.  Indeed,  the  seme 
confl gurat ion  Is  necessarily  the  same  for  all  varieties  of 
factoring  on  a  given  set  of  dimensions  that  result  in  trivial 
res ! duals. 

Dimensional  analysis  of  tbo  2i+  variables  in  the  Suburban 
Children. -  -App1 ying  the  sane  dimensional  procedure  to  the 
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*'orrelat:i.on  matrix  of  the  Suburban  Children  gives,  as  a  result, 
the  configuration  shown  in  the  top  SPAN  diagram  of  Fig.^-l.  At 
lower  left  in  the  configuration  Is  the  3ame  Verbal  cluster  as 
in  the  Factory  group,  at  lower  right  Speed,  at  the  top  Space, 
and  the  Memory  cluster  also  projects  into  a  fourth  dimension) 
the  mathematical  abilities  once  again  deploy  centrally  as 
dependant  variables  predictable  from  the  V,  S,  F,  and  M  dimensions. 
Clearly  the  cluster  structure  of  the  Suburban  Children  closely 
resembles  that  of  the  Factory,  One  obvious  difference  is  that, 
though  tie  cluster  groups  are  about  the  same,  they  are,  as  groups, 
more  separated  from  each  other  in  the  Factory  than  in  the  Suburban 
Children,  that  is,  less  correlated  with  (oblique  to)  each  other. 
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Comparison  of  tho  dimensions  within  each  group  separately 
(C0MP1)  „ --A  metric  description  of  the  vithin-group  structures  is 
provided  by  a  program  that  computes  the  correlations  between  the 
ability  clusters  defined  as  oblique  dimensions,  computed  by  the 
CSA  (Cluster  Structure  Analysis)  program  of  the  BC  TRY  System. 

The  values  of  these  correlations  are  given  in,  Tabled,  sectior..  A, 
where  you  see  the  correlation  matrix  of  the  V,  Ss  F,  and  M 
dimensions.  These  correlations  are  known  In  factor  analysis  as  the 
"correlations  between  rotated  oblique  factors",  or  their  "common 
factor  correlations".  In  cluster  analysis  they  are  called  "inter- 
domain"correlations,  where  each  cluster  is  conceptualized  as  a 
domain  score,  on  many  variables  collinear  with  the  observed 
deflners  of  the" cluster  (Tryon,  1959,  equation  2]*).  Thus,  the 
domain  score,  on  the  Verbal  cluster  is  a  hypothetical  score 
on  many  variables  collinear  with  the  observed  set,  V^,.  V^,  Vy, 

Vq,  and  V^j  shown  in  the  SPAN  diagram.  (The  term  "collinear" 
means  projecting  to  the  same  degree  on  the  same  vector  from  the 
origin  of  the  sphere.) 

The  inter-domain  correlations,  listed  in  Tabled  under  the 
columns  headed  r^  are  computed  from  the  raw  correlation  matrix 
using  the  well-known  formula  for  the  "correlation  of  sums".  As 
you  look  through  the  r^g  values  you  find  precise  metric  expres¬ 
sions  of  the  degree  oi*  similarity  of  the  four  basic  ability- 
dimensions,  V,  S,  F  and  M»  in  the  Suburban  and  Factory  Children. 
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For  example,  the  inter-domain  between  the  Verbal  and  Speed 
dimensions  is  aoen  to  be  A\  nrfl  \  \\2. ,  respectively;  that  is,  the 
two  dimensions  have  almost  exact  Ly  the  same  degree  of  similarity 
in  the  two  groups*  But  between  the  other  dimensions  you  will 
find  that  the  rs  are  generally  higher  for  the  Suburban  than  for 
the  Factory  children,  a  fact,  already  seen  visually  in  the  SPAN 
diagrams  of  F.lg<  1„  The  r,.,  values  aro  thus  a  metric  statement 
of  similarity  that  in  displayed  visually  on  the  spheres „ 

In  the  lower  sections  of  Tablets,  you  will  find  other 
metric  properties  of  the  four  basic  ability  dimensions ,  Tho 
’‘generality"  of  each,  given  in  section  C,  is  the  degreo  to  which 
each  dimension  accoants  for  all  the  raw  intor-rn  among  the  21\ 
abilities,  In  both  groups  »he  Verbal  dimonaion  is  the  most 
general,  but  in  tho  Factory  group  the  other  three  dimensions  aro 
more  specific  than  in  the  Suburban.  Of  special  interest  to  tho 
typological  analysis  is  tho  reliability  coefficient  of  tho  raw 
scores  on  tho  four  dimensions*  In  section  D,  the  reliability  of 
V  is  9,  but  of  tho  other  three,  only  of  the  order  ,7  or  ,8U 
(The  formula  for  reliability  is  known  as  alpha,  though  a  better 
term  is  the  Variance  Form  (Tryon?  1957).) 

Dlroct  conqparative  analysis  of  the  dimensions  across  groups 
( CQMP2 ) o — To  thi3  point  we  have  assessed  the  similarity  of  the 
V,  5,  P  and  M  dimensions  of  the  Factory  and  Suburban  Children  by 
the  subjective  process  of  cross-referencing  their  separate  configu¬ 
rations  in  Fig.4"!,  and  by  comparing  their  within-group  rcc  values 
in  Table** 2,  procedures  that  are  indirect  and  inferential?  Can  we 
directly  compare  their  dimensions? 

C>  L 

Pig.  2  Fig.  -2  displays  the  direct  comparison  achieved  by  the  program 

C0MP2  of  the  BC  TRY  System,.  In  this  SPAN  diagram,  traced  from  the 
printout,  you  will  note  that  the  Verbal  dimension  of  the  Suburban 
Children,  labelled  V^,  (for  the  GW  school)  and  that  of  the  Factory 
Children,  labelled  Vp  (for  Pasteur)  are  tightly  clustered  at  lower 
left,  meaning  that  they  are  qulto  similar.  At  lower  right  are 
the  two  points  representing  the  Speed  dimensions  of  the  two 
schools;  at  the  top  you  see  their  two  Space  dimensions,  and 
extending  into  the  fourth  dimension  ore  their  two  Memory 
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dimensions-  Thio  cluster  structure  therefore  directly  conq&res 
in  one  diagram  the  similarity  of  the  two-dimensional  structures 
thnt  v;e  only  indirectly  observed  above  by  cross-referencing. 

The  direct  Index  oi'  the  »i  llari  .y  of  any  two  dimensions 
across  different  groups  La  the  "index  of  similarity"  of  the  two 
q  mansions  (or  factors" ),  railed  the  cos  Q  between  them,  ror 
two  dimensions  within  a  group  co  i  &  is  equivalent  to  the  inter- 
domain  oorrelat ion,  rr  but  it  Is  estimated  not  from  the  raw 
correlation  matrix as  is  rec‘'  ^  ^e  oblique  factor 

coefficients  of  the  two  dl mens loon.  The  pi*oof  that  cos  <9  butvoen 
two  dimensions  /ithln  a  group  in  in  given  in  Table  2,  section 
A;  there  you  will  find  in  the  columns  labelled  '(Job  ^  this  index 
of  similarity  (computed  by  (.JMP2)  net  beside  the  rQC  value 
(computed  by  program  CSA).  fou  will  find  tliat  the  two  indices 
are  virtually  identical  in  every  cane. 

gut  since  tho  similarity  indox,  coa  d,  is  oooputed  only 
from  factor  cocffic4  outs ,  it  run  be,  of  courro,  calculated  for 
dimensions  across  different  groups*  These  similarity  values  are 
given  in  Table  2,  section  B .  Thsy  tell  the  ramo  story  metrically 
that  is  shown  pictorially  in  the  spherical  configuration  of 
Plg.42,,  On  the  upper  left  to  lower  right  diagonal  you  see  the 
index  of  similarity  of  V  in  the  Factory  and  in  the  Suburban 
Children,  then  of  0.,  F„  and  K.  ''or  ox-imple,  that  between  the 
Verbal  dimensions  in  the  two  groups  is  ,96-  between  the  two 
Speeds  it  is  .99,  between  Forms  .92  between  Memories  .83. 

For  the  technically-minded  reader  I  include  in  Appendix  A 
the  logic  and  formulation  of  cos  €>  as  an  index  of  dimensional 
similarity.  Briefly,  the  reasoning  by  which  we  deaignate  two 
dimensions  as  identical  i3  basod  on  the  universal  logic  by  which 
we  conceive  any  two  entitios  as  being  the  same,  namely,  that 
they  show  the  same  pattern  of  observations  in  relation  to  a  common 
set  of  other  "referont  entities".  For  sxaaqple,  the  Verbal 
dimensions  in  the  two  groups  ere  virtually  identical  because 
their  patterns  of  factor  coefficients  (the  observations)  on  the 
constant  set  of  ?i\  referent  abilities  aro  virtually  identical. 

The  index  of  pattern  similarity  of  any  two  entities  on  a  common 
set  of  referent  entities  13  P,  called  the  r.dex  of  proportionality. 
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or  colllnearlty ,  described  in  dotail  in  Appendix  A  for  the  case 
of  pattern  similarity  of  the  factor  coefficients  of  any  two 
dimensions.  Tho  value  of  the  indox  of  similarity,,  cos  of  any 
t.wo  dimensions  in  differ ont  groups  is  a  simple  quadratic  function 
of  P,  as  shown  in  Appondix  A. 

^To  sum  up,  we  find  in  i‘ig„  '2,  and  from  the  metric  values  in 
Table  2  that  tho  four  basic  dimensions  V,  S,  P,  and  M  in  the  two 
groups  are  highly  similar.  But  Ln  the  Factory  Children  they  are 
somewhat  more  independent  of  each  other  than  in  the  Suburban. 

•ftiy?  An  environmental  explanation  is  that  tho  parents  of  the 
Suburban  Children  stress  scholastic  achievotaent,  implementing 
their  ambition  by  pushing  their  "promising”  ohildren  in  all 
abilities t  letting  their  loss  promising  children  fend  for  them¬ 
selves,  Consistent  with  this  theory,  we  find  that  it  is 
precisely  in  the  Suburban  Children  that  the  scholastic  institution 
of  "homogeneous"  classification  is  employed,  namely,  the  sorting 
of  sheep  and  goats  into  different  classrooms.  In  the  Factory 
group,  children  generally  are  left  to  fend  for  themselves. 

But  there  is  an  alternative  genetic  explanation:  There 
probably  is  more  stringent  assortative  mating  on  abilities  among 
Suburban  parents.  This  sort  of  3exual  selection  would  generate 
a  higher  correlation  among  all  abilities  in  the  Suburban  group 
than  in  the  Factory,  wtaero  assortative  mating  would  be  more  random. 
A  systematic  treatment  of  such  environmental  vs.  genetic  "correla¬ 
tion-producing"  agencies  in  the  case  of  abilities  is  presented 
elsewhere  (Tryon,  1935 ,  1939). 

Comparative  typological  analysis  in  the  Holzinger  Problem  (OCOMP 
analysis) ♦ 

When  we  allocate  children  having  the  same  patterns  of  soores 
on  the  basic  abilities,  V,  S,  F  and  N,  to  O-types,  <ft»  we  find  the 
same  typological  structure  of  these  O-types  in  the  Factory  and 
Suburban  groups? 

Similarity  of  frequency-pat  ter  ns  of  the  imo  groups  on  the 
oomon  typology  of  the  Inclusive  group.-- The  first  of  two  ways  of 
determining  the  typological  similarity  of  two  groups  is  to  dlsoover 
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the  degree  to  which  they  show  tho  same  frequency  of  cases  falling 
in  tho  common  typology  of  the  Inclusive  Group.  You  will  find  this 
common  set  of  0-typeo  in  Table  '3  under  the  general  heading  at 
left,  "Inclusive  typology".  ( Hou  these  types  are  determined  will 
be  described  later  )  Look  at  the  first  type*  labelled  HI,  con¬ 
sisting  af  ll|  children  whose  pattern  of  cluster  scores  on  basic 
abilities,  V,  s,  F,  and  M  is  lid,  36,  1|)|,  37*  respectively.  These 
are  mean  standard  Z-scores  on  a  scale  whose  mean  for  the  301 
children  in  the  Inclusive  Group  is  50,  and  sigma  10.  Underlined 
scores  of  40  (-1  sigma)  or  below  aro  termed  "Low"  in  the  column 
headed  "Descriptive  name”*  those  60  (+1  sigma)  or  above  are 
called  "high".  For  thin  HI  type  you  will  therefore  find  it 
described  in  the  table  as  "Low  Speed  and  Memory". 

Aa  you  look  down  the  column  of  types  from  HI  through  Hl5 
to  the  class  called  Unique  you  see  in  the  adjacent  frequency  column 
chat  some  typos  have  a  high  frequency,  like  H9,  the  Average  type, 
with  38  children  in  it,  others  with  low  frequency,  like  H 2,  the 
Lcrw  Verbal  and  Memory  type,  with  only  eight  cases  in  it.  Our 
logic  of  typological  similarity  of  the  Factory  and  Suburban 
Children  is  airnoly  this:  If  both  groups  show  the  same  frequency 
pattern  on  these  common  16  Inclusive  classes,  then  they  have  the 
same  typological  structure,  but  to  the  degree  that  their 
frequencies  in  these  16  clasnoa  differ  from  each  other  their 
typologies  are  obviously  different. 

Before  examining  the  findings,  I  will  briefly  review  how 
the  typology  of  the  Inclusive  group  is  determined,  a  matter 
published  in  some  detail  in  a  recent  paper  (Tryon,  1967a).  The 
cluster  scores  of  each  of  the  301  children  are  first  computed  by 
program  FACS  (Factor  And  Cluster  Scores)  by  the  BC  THY  System. 

For  example,  their  scores  on  V  Verbal  are  tho  mean  of  their 
standard  scores  on  the  four  defining  variables  of  V  (listed  in 
Table6!)*  re  standardised  on  a  scale  of  mean  50,  sigma  10.  The 
program  0TYPE  inputs  the  301  cluster  scores  and  completely 
objectively  allocates  them  to  the  16  classes  given  In  Tabled* 

The  principles  of  classification,  called  tho  Condensation  Method, 
are  quito  single:  All  301  scores  are  located  as  points  in  the 
cluster  score  3pace  of  the  four  dimensions  defined  by  the  V,  S, 


to*'  :i  re  »/>•»..  an.',  him'  '  ..  fn  i  the  Aiclliiaio  tlL'itsnces  betueea 
rnti  ♦•h  f  .'C£i—-  a  •’ -cut*.  tnr>*  with  .pj'1  dir'  sure'  between 
r  »-a  h«-  is  ’  r-  •  if.  h  t ! /ii  bjsot-c2  unter  in  this  npace, 

i  .oc.  In'  "0,-#  '  *  yjv .  Ascti  i*  *>  o”r®m..  08‘PAV  (Object  -iTATi sties 
i  n  cut.;*  ‘  .n  »1  sea*.  /-center  •  ’  the  udl/’d-alr.  In  •-ft'**  »  luster ; 

•v  i  In  .  10  6*3#  •»»•*  fits:-*  ute.i  uu  ln-ox.  e»*  horacaecei  ty,  ij, 

«*>r  t/g;  ii  os  of  tho  •  i-  3t  .*r  (Pox  decal?..,  i.o*  Tryo-i,.  i  ;67»K 

'  c  i  now  *■'•  i  h'i  si 'iii  1  q lit y  of  thf-  fre^uoaie,' ‘patterns  of  the 
Kuetor/  *4nii  '.ubir*  an  proupr  Pr  n  the  OtlTAT  printout  me nt:'  owed 
I'.j’t  i  'c«/  \  i v  a  Minnie  v  tt.  to  count  ho*  «*. '.y  children  in 

•  ~h  t  .->  ...  fax'  o?.o  the  In  ch:  os.,  f.-ocs  which  Mi-*  percentage 
f..Uirv;  I  .to  of-t.  -n  i.v  ccr.pu!.od.  rhese  per*  -.-mtstges  arc 
-’i*i  .t.  .I  in  iv.hl jncer  tn  gene  ral  beading  Rectory  vn.  .Suburban"*, 
T*  l.'sted  values  in  the  two  columns  label lod  "p^*  and  "pa”  ure 
ih'  'ri'.ioau.  fb’oquoricy -putt*^  nc  rf  the  tuo  groups,  on  the  basin 
«v  wb!  :h  t.i.<s •  r  1  ypolo&ioa '  limll  rity  in  determined.  The  overall 
xodojL  of  tiir.iJurity  given  iuat  i  olov  ilia  table,  la  the 
Ceuorel  1  .dox  of  preport iontlJty  P,  dlncussod  earlier,  the 
f  .t.n ia  lor  wiilch  1  printed  be?  v  the  table.  It  you  lnspo«  t  this 
forra.lt..  you  .hi'  discover  rhat  .  f  two  groups  ha”*  exactly  -;be  sane 
1  re.‘iu*-*n*y  -par terns  1  ©.  ,  p^,  *  f..,  then  the  index  is  uni-y  (1  00) , 
But  If  their  pc ti erne  nre  u'ftarTy  different,  that  Is,  if  the 
o-ru-Torice  oP  eerh  type  in  or.a  group  is  natohei  vir.h  the  absence 
(P  =-  01  in  the  ct.hnr  group,  then  the  index.  P,  la  sere.  I  have 
workea  out  tho  value  of  P  for  the  two  ecological  groups  of 
child* eu  telow  ti  e  table,  irfiere  you  will  find  it  to  be  P  *  .75* 
aenntirtf'  a  ccnsidoreMe  omo* ' rit.  of  typological  einllarity  of  the 
l  VO  gvoup.1. 

Ot*  greater  interest,  newevor,  are  the  specif Ic  type 
differences  between  the  two  groups..  These  values  ore  listed 
under*  "Liff”  In  Tab lo43.  Octane.-  tbe  sampling  error  of  euch 
di  ft’c 'otiv-as  can  bo  large,  it  is  desirable  to  indicate  which  of 
theno  differerooa  lv  jnliko  s  fco  occur  chance  at  the  3t**ong 
conflrie'xcn  lovoi  of*  j><  001.  fortunately .  we  are  worid.ng  with 
iia-11  value*1  of  per  cen’p  which  .ceep  the  error  down.  Expressing 
the  per  cents  a*  proportion* ,  ve  note  that  the  neen  proportion 
in  tne  16  claaoen  1*  lp  /1.6  -  1.U0/16  «  .06.  Since  aoat  of  the 


proportions  of  the  types  i r>  noth  groups  ore  not  too  greatly 
different  from  ,  06,  v;e  will  compute  the  r.tanda rd  error  of  a 
difference  between  two  true  proportions  of  «06,  using  the  violl- 
known  formula  for  thin  error  printed  at  the  bottom  of  the  table, 
and  worked  out  for  the  Ns  of  the  two  groups*  rt  comen  to  3  We 
may  therefore  aef  a  per  cent  of  3  as  the  lower  bound  at  and  above 
which  any  difference  is  almost  surely  non-chance , 

You  will  find  all  differences  above  3  indicated  by  an  (S)  for 
Suburban  or  an  (?)  for  Factory,,  depending  on  which  group  has  the 
highest  per  cent-  For  example,  note  that  the  largest  difference 
between  per  cents  is  12  in  type  H8,  bow  Verbal,  For  this  difference 
the  greatest  poi  cent  frequency  is  I.,  in  the  Factory  group  Next 
comes  itlC\  Hi  Vernal,  most  char- act  eristic  of  t,he  Suburban  group 
Those  two  Verbal  types  therefore  represout  the  greatest  typological 
difference  between  the  two  groups.  If  you  look  through  tho  other 
significant  differences  you  will  discover  that,  the  Suburban  group 
falls  more  heavily  into  Low  Memory  (H3)  end  Low  Speed  (H7)  whorens 
the  Factory  Children  occur  more  frequently  in  tho  Hi  Memory  ( KUp ) 
and  Hi  Speed  (H31)  types.  Verbal  Memory  and  Speed  therefore  most 
markedly  differentiate  the  typological  difference,  between  Factory 
and  Suburban  children. 

Since  sox  differences  in  abilities  are  of  universal  interest, 

I  have  also  presented  the  data  for  determining  the  typological 
similarity  of  tho  Boy  vs.  Girl  subgroups,  in  the  far  right  columns 
of  Table^-3*  From  their  columns  of  per  cents  in  the  16  classes,  the 
index  of  similarity  for  the  sex  groups,  worked  out  below  the  table, 
is  seen  to  be  P  =:  .,851,  somewhat  higher  than  for  the  Factory  and 
Suburban  groups,  If  you  examine  in  detail  the  significant  differ¬ 
ences,  you  will  find  that  boys  more  frequently  fall  into  Low  Speed, 
Low  Memory,  and  Low  Verbal  typos,  the  girls  being,  conversely,  in 
the  Hi  types  in  the  abilities.  On  the  other  hand  girls  fall  more 
frequently  into  Low  Form  (Space)  typos,  boys  into  Hi  Form.  Thin 
finding  on  the  Verbal  favoring  girln,  the  Form  (or  Spa°o)  favoring 
bojy  has  been  confirmed  in  many  studies,  but  Low  Speed  nnd  Low 
Memory  in  boy:  Is  a  lo3n  well  known  finding, 
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Sjjpi i l  —  ' '  J  of  empirical ly-deri ved  typologies  of  the  groups. — 
The  h  jov  «  analysis  inforran  us  of  differences  betwce a  Factory  and 
Suburban  Ch  1  ldi  an  only  on  the  single  comraon  typology  of  tne 
inclue-  .ve  group.  Out  for  fuller  information,  we  noud  tc  discover 
t.m  irice.lly  the  typology  of  each  group  Independently  of  the  other, 

•md  to  cotqpnre  directly  tholr  two  typolo,'io3  „  The  procedures  for 
d».  lug  nu  a  '9  iVai'atlo  in  pi’Ograms  of  tho  BC  THY  System.  On  the 
1£6  Pact or v  Children  separately  wo  objectively  determine  their 
typology  by  the  OIYPE  and  OSTAT  programs  described  abovo.  You  will 
find  its  ii>  oles son  in  Table*  tj...  where  under  "Factory  Children"  they 
types  Hi  down  through  H14  to  Unique.  Their  Z-acore 
profile  vaiuv  and  dercriptivo  names  we  alao  givon.  You  will  also 
in.',  t;  i  homogeneity,  or  It  ooeff loionts  that  describe  how  "tight" 
Q*ch  O-tyoe  in  in  its  2-scorea  on  the  four  dimensions .  This  coef¬ 
ficient  has  been  described  In  detail  elsewhere  (Tryon,  1955*  19fe7a' 
md  witt  special  emphasij  in  a  recent  paper  on  the  prediction  of 
"outsid  "  attributes  of  O-types  (Tryon,,  1967h$.  Suffice  here  to 
Sdy  that,  nn  U  value  of  J  .00  moans  that  all  individuals  in  an  O-type 
[ij.f.i  ox.:.'l1  tho  a :4me  ,-,coros  on  each  of  the  four  dlrnensi-ona, 
whereas  m  of  .OO  means  that  the  scores  are  as  variable  in  ail 
four  d4 moral ona  as  in  the  full  supply  ol  all  iOi  children. 

In  similar  fe'»hion  the  separately  worked  out  typology  of  tho 
Suburban  Children  Ij  given  at  the  right  in  Table  4»  where  you  will 
find  the  13  exa rues  of  these  children  listed  from  SI  through  S12 
to  Unique. 

You  can  get  a  general  impression  of  tho  typological  similarity 
of  the  twe  groups  by  comparing  the  descriptive  names  of  the  two  and 
ov  notinr  from  t.h«s3e  nar.os  which  types  are  present  In  both  groups 
and  wh loh  oiv'.»  are  present  in  one  but  absent  In  tho  other. 

We  no^d  a  more  precise  comparison  of  the  different  typologies.. 
Tc  achieve  such  precision  we  poojoct  all  the  26  types  of  both  groups 
d4  H  types  nlns  12  S  t.ypus)  into  the  seme  analysis,  from  which  we 
jet  oxnrt  values  of  the  similarities  and  differences  between  them. 
The  procedures  r'*r  aoing  so  are  0a1  lad  'OTCO-unalysis'  In  the  BC 
TFV  System  '::o  U;p  .0  of  tb*'  •  mi/sl''  la  ruite  simplex  >,ach  type 
iu  cons’-  te  an  oba cruet  "individual"  plotted  v  point 

in  the  clustor  score  spate  of  V,  S,  r,  and  M  where  its  locus  is 


determined  by  its  four  2-acoros  listed  in  Table*lj.,  Program  EUCO 
computes  the  Euclidean  distance  between  each  pair  of  types,  and 
prints  these  values  in  a  pair-cooq>  aria  on  matrix  from  which  one  oan 
read  off  preolsely  the  degree  of  similarity  between  any  two  types. 

Space  limitations  do  not  permit  printing  this  Euclidean  dis¬ 
tance  matrix  here,  In  its  stead,  however,  I  present  a  pictorial 
representation  of  the  distances  between  the  types  in  the  form  of 
the  SPAN  diagram  given  in  Fig.^-3«  To  secure  this  diagram,  the  EUCO 
matrix  is  first  transformed  to  a  correlation  matrix  by  correlating 
columns  of  EUCO  values,  then  running  this  r-matrix  through  a  standard 
key  cluster  analysis,  ending  in  the  SPAN  diagram  of  Figo^-3. 

The  configuration  on  the  SPAN  sphere  describes  the  similaritiss 
and  differences  between  the  Factory  and  Suburban  O-types,  The 
oircles  represent  the  lij.  Factory  O-types,  the  squares  the  12 
Suburban,  I  also  include  in  this  analysis  the  15  Inclusive  H-types 
frcm  Table  3*  The  sizes  of  the  circlee  and  squares  and  the  length 
of  the  underline  of  the  H-types  are  proportional  to  the  frequency 
of  each  type.  Note  that  the  four  dimensions,  V,  S,  F,  and  M  are 
also  plotted,  these  being  secured  by  inputting  model  abstract 
"individuals"  whose  four  Z-score  values  are  especially  selected  to 
enable  one  to  plot  the  dimension  lines  as  score  axes. 

The  large  super-cluster  at  left  center  consists  of  types  all 
in  the  "I0W"  region,  meaning  that  generally  they  have  Z-scores 
below  the  mean  on  all  four  dimensions.  Note,  however,  that  this 
super-cluster  breaks  off  into  two  general  subclusters.  The  upper 
one  consists  largely  of  Suburban  types  SI,  S2,  S3,  fairly  well 
represented  by  the  Inclusive  types  HI,  HI;,  H3  and  H6,  whereas  the 
lower  subcluster  consists  largely  of  F,  or  Factory  types,  which 
with  S4,  are  well-represented  by  Inclusive  types  H2,  H5  and  H7. 

From  these  facts  we  discern  the  similarities  and  differences 
between  the  types  in  this  general  region  of  low  scoring,  noting 
especially  that  there  are  real  differences  in  the  typologies  of  the 
two  groups  in  this  region.  I  leave  to  the  Interested  reader  a 
detailed  study  of  the  rest  of  the  configuration.  The  scores  of 
O-types  can  be  approximated  by  reading  off  projections  on  the  four 
score  axes,  but  more  accurately  by  reading  the  actual  values  and 
descriptions  given  in  Tables'3  and*lj.<  Types  represented  by  broken 
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circles  and  squares  lie  into  the  fourth  dimension. 

Generally ,  a  study  of  the  configuration  reveals  findings 
similar  to  those  found  from  the  similarity  of  the  frequency-patterns, 
namely,  that  Verbal,  Memory  and  Speed  moot  markedly  differentiate  the 
Factory  and  Suburban  groups.  For  example,  note  at  the  top  of  the 
diagram  that  Hi  Verbal  is  represented  only  by  a  Suburban  type,  37 « 

Low  Verbal  through  the  southern  hemisphere  is  heavily  dominated  by 
Factory  types, 

A  final,  salient  question  is  this  one:  How  well  do  the  15 
Inclusive  O-types  representatively  sample  the  26  different  types 
in  both  ecological  groups  of  children?  This  question  is  important 
because  in  the  practical  usage  of  the  typology  of  abilities,  these 
would  be  the  types  usually  used  for  the  classifications  of  indivi~ 
duals.  The  answer  is  provldod  by  noting  whether  one  or  more  of  the 
15  H-types  lie  in  all  regions  occupied  by  the  26  Factory  and 
Suburban  types.  By  inspecting  the  SPAN  diagram  and  by  comparing 
the  F  and  S  types  of  Tabled  with  the  H  types  of  Tabled  you  will 
note  that  the  15  H-types  fairly  cover  the  ground. 

The  Study  of  the  MMFI 

Comparative  dimensional  analysis  (COMP  analysis) 

The  second  study  selected  for  comparative  dimensional  and 
typological  analysis  is  that  of  the  responses  to  the  items  of  the 
MMPI  by  Normals  vs.  Patients, 

i  * 

The  item-variables.— The  variables  are  118  items  of  the  MMPI 
drawn  from  the  full  item  supply  of  566  to  which  the  subjects  re« 
sponded.  The  118  were  those  shown  in  a  previous  study  to  be  the 
moat  salient  set  (Tryon,  1966b).  The  method  used  in  the  prior 
study  is  called  the  BIONV  procedures  of  the  BC  THY  System,  a 
method  that  enables  one  to  perform  cluster  or  factor  analyses 
unrestricted  by  the  number  of  variables  or  number  of  subjects. 

The  subjects  were  the  Inclusive  Group  consisting  of  tbs  Normal 
and  the  Patient  groups. 


The  groups . — The  Normals  were  90  Armed  Service  Officers 
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matched  for  age  and  education  against  220  Patients.  The  latter  were 
outpatients  of  a  VA  Mental  Health  Clinic,  consisting  of  70  diagnosed 
Schizophrenics  all  with  a  hiatory  of  hospitalization  within  the 
previous  6  years,  and  150  diagnosed  Anxieties  none  with  a  history  of 
any  hospitalization  for  psychiatric  disorder. 

Dimensional  analysis  of  the  116  iten-varlablea  in  the 
Inclusive  Group, --Recall  fx*otu  the  Holzinger  study  that  a  comparative 
dimensional  analysis  of  two  groups ,  here  the  Normals  and  Patients, 
can  only  be  performed  when  the  subjects  ore  measured  on  the  same 
dimensions  defined  by  the  same  variables,  usually  those  discovered 
in  i  dimensional  analysis  of  the  Inclusive  group.  This  analysis 
revealed, four  ’’basic"  MMPI  item-clusters:  I  Introversion,  B  Body, 

S  Suspicion,  and  T  Tension,,  The  defining  items  of  these  four 
dimensions  are  those  whose  item-numbers  are  listed  in  Tablets 
section  An  T  do  not  present  a  more  detailed  description  of  these 
it  ems  because  it  would  be  too  voluminous;  but  a  paraphrasing  of  them 
is  given  in  the  previous  study  (Tryon,  1966b,  Table  2),  and  the 
exact  contents  are  given  in  MMPI  booklets,  generally  available  to 
most  readers o  You  will  note  in  Table  5  that  each  item-cluster 
consists  of  a  "Full  Form  "  and  a  "Short  Form**.  The  eonpe.rs.tlv# 
dimensional  analysis  presented  in  this  section  was  performed  on 
the  scores  of  subjects  on  the  Short  Forms,  and  it  also  includes 
the  Short  Form  items  of  the  other  three  "dependent"  item-clusters, 

D  Depression,  R  Resentment,  and  A  Autism,  Whoso  item-numbers  are 
also  given  in  Table  5,  section  B„ 

The  dimensional  analysis  of  the  Inclusive  Group  from  which  the 
four  basic  and  three  dependent  dimensions  were  derived  cannot  be 
presented  here  because  it  is  fully  explicated  In  the  prior  publica¬ 
tion.  However,  the  results  of  it  are  so  similar  to  those  given 
below  on  the  Patient  Group  (See  Pig.4!*,  top  diagram),  that  no  point 
would  be  served  in  giving  the  findings  here.  In  sum,  it  was  found 
that  seven  dimensions  were  required  to  account  for  the  lnteroorrela- 
ti one  among  the  116  items,  but  that  the  first  three  basic  dimensions. 
Introversion,  Body,  Suspicion,  were  the  most  nearly  independent 
clusters  (as  Flg^  Ij.  shows);  only  four  pools  of  small  residuals 
remained  in  the  matrices  of  the  four  D,  R,  A,  and  T  dusters. 


A 


Mg.‘  4 
'•bout 
hero 


Table**  6 
about 

here 


Since  the  last  of  thosep  T  Tension,  had  the  greatest  generality 
of  the  remaining  four*  it  was  deoided  to  add  T  to  I,  B,  and  S  as 
the  final  set  of  basic  four  dimensions  of  the  MMPI. 

Dimensional  analysis  of  the  118  item-variables  in  the  Patient 
Group o  — A  full  cycle  key  cluster  solution  of  the  intercorrelations 
between  the  lid  items  in  the  Patient  group  resulted  in  the  cluster 
structure  depicted  in  Fig.^p  top  diagram.  This  factoring  process 
was  "preset"  on  the  four  basic  dimensions  defined  by  the  items  of 
Ip  Bp  Sr  and  T.  In  the  tight  duster  at  lower  left  in  the  configu¬ 
ration  the  symbols  plotted  as  "I"  and  enclosed  in  a  broken  line  are 
15  of  the  17  Introversion  items  that  define  this  cluster.  The 
remaining  two  lie  nearby  in  the  direction  of  the  two  arrows.  In 
another  tight  cluster  over  at  lower  right  are  16  Bodyf  or  Bp  items; 
the  17th  Item  was  dropped  from  the  analysis  because  of  trivial  oommu- 

p 

nality  (h  <«1)„  At  the  top  you  will  find  the  Suspicion  cluster. 

The  remaining  four  clusters ,  Depression,  Resentment,  Autism,  and 
Tension  lie  within  the  framework  of  the  three  I,  B,  3  clusters. 
Clearly  the  total  configuration  for  the  Patient  group  shows  an 
excellent  cluster  structure;  it  is  virtually  the  same  as  that 
found  previously  in  the  total  Inclusive  group  (Tryon,  1966b,  Fig.  1). 

Dimensional  analysis  of  the  110  item-variables  in  the  Normal 
Group.— A  radically  different  dimensional  structure  emerges  in  the 
Normals,  shown  in  the  SPAN  diagram  of  Fig lower.  The  dramatic 
change  is  in  the  Body  cluster  which  was  so  sharply  evident  in  the 
Patient  group.  It  is  absent  as  a  distinct  duster  among  Normals! 

And  so  are  the  Depression  or  Autism  clusters.  But  Introversion  and 
Suspicion  do  appear  as  fairly  independent  item  groups.  Tension  and 
Resentment  also  remain  but  move  into  a  grand  arc  bounded  by  Intro¬ 
version  and  Suspicion.  It  appears  as  if  only  Introversion  and 
Suspicion  are  the  dominant  and  distinctive  dimensions  of  Normals 
in  the  MMPI  item-clusters. 

Comparison  of  the  dimensions  within  each  group  separately 
(COUP  I).— Precise  numerical  statements  about  the  seven  item- 
clusters  in  each  of  the  two  groups  are  given  in  Table  '6,  section  A 
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(analagouo  to  Table  2  in  the  Holxingar  Problem),  and  sections  G 
and  D.  The  relationships  between  the  seven  domains  represented 
as  dimension/!  {or  "oblique  factors")  are  given  in  saotion  A  by  the 
inter-domain  rcc  values;  those  for  the  Patients  are  above  the 
lined-olf  diagonal,,  those  for  Normals  below.  Recall  that  these 
"correlations  between  oblique  factors"  are  merely  abstract  metric 
descriptions  of  the  complex  relationships  depicted  in  the  SPAN 
dlagram:.  and  though  they  are  more  precise  numerical  statements 
compared  to  the  verbal  statements  about  the  configuration,  they  are 
more  difficult  to  conceptually  organize.  And  they  can  be  misleading* 
I  must  leave  to  the  reader  a  detailed  examination  of  this  complex 
table  of  relationships,  suggesting  that  he  cross-reference  his  study 
of  It  by  simultaneously  referring  to  the  visual  configuration  in 
Fie ‘4 

Several  obvious  points  may,  however,  be  mentioned  here.  In 
both  groups  the  Introversion  and  Suspicion  dimensions  are  the  most 
independent,  and  Tension  is  most  positively  correlated  with  all 
the  other  dimensions.  But  the  Body  dimension  is  radically  different 
in  the  two  groups,  fairly  specific  in  the  Patients  but  rather  general 
in  the  Normals,  indeed  correlating  .90  with  AutisraS  But  this 
generality  of  the  Body  dimension  is  misleading  in  the  Normal  group, 
because  from  the  configuration  we  know  that  Body  is  not  a  cluster- 
defined  dimension  in  Normals  but  a  mere  sampling  of  heterogeneous 
Items  from  their  whole  sphere  of  items.  It  is  an  omnibus  grab-bag 
of  items  in  the  Normal,  just  as  is  Autism,  so  their  high  correlation 
is  merely  due  to  both  being  similar  hodgepodges. 
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Direct  comparative  analysis  of  the  dimensions  aorosa  groups 
(COMP  ?.) c — When  we  project  the  dimensions  of  the  two  groups  into  the 
same  C0MP2  analysis,  we  see  direotly  and  dearly  the  relations  among 
the  dimensions  both  within  but  especially  across  tbs  two  groups. 

They  are  pictorially  displayed  in  the  single  SPAN  diagram  of  Pig.^5 
(analogous  to  Fig.^2  of  the  Holzinger  Problem).  The  sharply 
differentiated  and  spread  out  dimensions  of  the  Patients,  denoted 
by  the  subscript  "P"  attached  to  the  seven  dimensions,  I,  B,  S,  D„ 

R,  A,  T„  confirms  the  withln-group  cluster  structure  of  their  items 
as  previously  depicted  in  the  upper  sphere  of  Fig/'ij..  In  contrast. 


Tabl« 

about 

hare 


the  within-group  structure  of  the  dimensions  or  the  Normals, 

Indicated  by  the  subscript  "N"c  confirms  the  narrow*  essentially 
two-dimensional  band  ranging  from  Introversion  to  Suspicion, 

Consider p  now,  the  similarity  of  the  dimensions  across  the 
two  groups  as  objectively  measured  by  the  cos  £  values,  given  in 
Tabled.,  section  B,  especially  those  down  the  upper  left  to  lower 
right  diagonal.  The  most  similar  dimensions  across  the  groups  are 
Introversion  (.73),  Suspicion  (.74)*  Resentment  (.80),  and  Tension 
(.73)o  The  least  similar  is  Body  (.49),  a  different  kind  of 
dimension  in  the  two  groups. 

Attention  is  again  drawn  to  the  correspondence  between  the 
index  of  dimensional  similarity,  coa  and  the  inter-domain 
("common  factor")  correlations,  rcc,  given  as  paired  values  in 
section  A  of  Table6  6  (analogous  to  section  A  of  Table^-2  in  the 
'iolzinger  study),  They  show  a  close  correspondence  only  for  tight 
clusters  1 1:  S,  R  and  T.  Thus  it  is  that  in  the  comparative 
dimensional  analysis  of  variables,  the  COMP  2  analysis  accurately 
reveals  the  degree  of  similarity  only  of  those  dimensions  defined 
by  tight  (highly  collinear)  clusters,  a  matter  developed  in 
technical  Appendix  A. 

Comparative  typological  analysis  in  the  MMPI  Problem  (0C0MP  analysis). 

The  comparative  typological  objective  is  to  discover  the  degree 
to  which  C*typ«3  of  individuals,  formed  by  classifying  together 
individuals  having  the  same  pattern  of  Z-scores  on  the  four  basic 
MMPT  dimensions,  Iy  B,  S,  T,  have  the  same  structure  in  the  Patient 
and  Normal  Groups.  In  this  dialysis,  each  parson  was  soored  by 
his  Full  Form  scores  on  I,  B,  3,  and  T„ 

Similarity  of  frequency-patterns  of  the  two  groups  on  the 
common  typology  of  the  Inclusive  Group. --In  the  typologioal  analysis 
of  the  Inclusive  group  by  program  OTYPE  (Iteration  1^).  I  reported 
previously  that  11;  0-types  emerged  (Tryon,  1967a).  These  are  listed 
as  types  Ml  to  M14  in  Table6?,  under  "Inclusive  typology",  where  you 
will  find  their  frequencies v  Z-scores  on  I,  B,  S„  T,  and  descriptive 
names.  When  the  Normal  and  Patient  subjects  are  sorted  to  these 
14  Inclusive  0-typos,  the  percentages  of  cases  falling  into  them  are 
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the  vaLues  listed  In  the  in  each  group"  columns.  Aa  a  point  of 
special  interest,  I  separated  the  Patient  group  into  its  two 
component  diagnostic  groups.  Anxieties  and  Schizophrenics. 

The  overall  similarity  of  the  typology  of  the  three  groups 
in  relation  to  each  other  is  given  by  their  P  values  at  the  foot 
of  the  table  (analogous  to  the  presentation  in  the  Holzinger  Problem 
given  in  Table*3).  Normals  vs.  Anxieties  show  a  P  *  .17#  indicating 
virtually  no  similarity  in  their  typological  structures.  Curiously# 
there  is  a  mi  Id  typological  similarity  of  Normals  and  Schizophrenics# 
whose  P  *  .41.  The  typologies  of  the  Anxieties  and  Schizophrenics# 
in  contrast#  bear  considerable  resemblance#  having  a  P  ■  .73«> 

But  the  details  of  their  differences,  given  in  the  column 
headed  "Differences",  are  of  great  interest.  Note  that  the  Normals 
are  almost  exclusively  concentrated  in  types  Ml#  112#  and  M3# 
described  generally  as  Extrovert,  Healthy#  and  Relaxed#  with  a  few 
in  M8#  the  Suspicious.  The  Anxieties  excell  in  the  Somatic  types, 

M7r  M9#  M10,  M13,  M14#  thus  being  persons  most  preoccupied  by  body 
disturbances c  The  Schizophrenics,  compared  to  the  Anxieties, 
bebave  typologically  somewhat  like  Normals#  excepting  that  they 
fall  heavily  in  the  Introvert  type#  Mil. 
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Similarity  of  empiric  ally-derived  typologies  of  •;  ho  groups.-- 
Puller  information  on  the  differences  between  the  0-types  of  the 
Normal  and  Patient  Group  a  comes  from  direct  ootqparinon  of  their 
typologies  aa  these  are  empirically  derived  separately  by  the 
OTYPE  and  03TAT  programs  but  projected  then  into  the  same  compara¬ 
tive  analysis.  In  Tablets,  left#  you  will  find  that#  when  the 
typology  of  the  Normals  is  worked  out  independently#  they  fall  into 
I);  types,  N1  to  N14#  with  no  Unique  individuals.  In  the  right 
sector  of  the  table#  you  will  discover  that  the  Patients  were 
allocated  to  12  O-types#  PI  to  P13* 

As  you  look  through  the  descriptive  names  of  the  Normal  and 
Patient  0-types#  you  may  be  astonished  to  discover  that  there  is 
no  overlap  of  their  27  types  except  for  the  Average  and  the 
Trusting  0-typea#  but  that  even  in  these  the  Normals  have  only  a 
handful  of  cases  whereas  they  are  abundant  in  the  Patient  group. 

In  sum#  one  finds  that  Patients  are  clearly  distinguished 
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from  Normals  in  their  objectively-derived  patterns  of  MMPI  scores „ 
This  finding  goes  directly  to  the  question  of  the  validity  of  the 
MMPI  in  distinguishing  patients  from  normal  persons*  Our  finding 
here  definitely  demonstrates  the  validity  of  the  MMPI  items  in 
differentiating  Normals  from  Patients v  providod  the  itom-clvstor 
scores  on  I#  S  and  T  are  used  (and  not  the  hodgepodge  in  the 
usual  unclustered  scales)  and  provided  the  objective  typology 
described  in  these  pages  is  used  as  the  classificatory  scheme* 

When  the  2 7  types  are  projected  into  the  same  EtJCO-analysis 
(see  the  treatment  of  HJCO-analysis  in  the  Holzinger  Problem)  along 
with  the  j/j.  Inclusive  0-types,  the  grossly  different  typological 
structure  of  the  Normals  and  Patients  stands  out  boldly,,  This  fact 
is  clearly  evidont  in  the  spherical  representation  of  the  types 
given  in  Pig0^6  (analogous  to  Fig.*- 3  of  the  Holzinger  Problem)*  The 
Normal  types,  symbolized  by  "N"  and  placed  in  circles,,  are  virtually 
all  located  in  a  super-cluster  at  the  left  in  the  "LOW"  score  ranges 
on  all  dimensions*  The  Patient  types,  symbolized  by  "P"  placed  in 
squares,  ore  largely  in  the  super-clustorat  the  right  or  "HIGH" 
region  of  the  configuration*  This  separation  confirms,  of  course, 
the  finding  of  the  previous  section,  but  the  SPAN  conf iguration 
provides  a  more  differentiated  dercription. 

Finally,  observe  the  locus  of  the  Inclusive  types,  symbolized 
by  "M"  and  underlined.  You  will  discover  that  these  14  types  are 
located  in  all  regions  of  this  typological  space  where  there  are 
Normal  and  Patient  types.  This  fact  means  that  as  a  system  of 
classifying  individuals,  normal  or  mentally-ill»  the  14  Inclusive 
types,  expounded  on  in  more  detail  in  an  earlier  paper  (Tryon, 

1967a) 3  satisfactorily  cover  the  ground. 


Appendix  A.  Logic  of  cos  0  as  an  index  of  similarity  between  any 
tVQ_to.9.iMLi9J». 

Wo  begin  by  noting  that  within  a  group  the  index  of  similarity 
of  any  two  dimensions,  and  C^,  is  the  inter-domain  correlation. 


(1) 
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where  Zr^ ^  la  the  aun  over  the  matrix  of  raw  ra  aoroaa  def Inara  of 
the  two  dimensions,  and  Zr^  and  Zr,^  are  sums  over  the  rax  ra 
within  definers  of  eaoh  dimension.  This  is  the  old  "correlation 
of  sums"  formula  (Tryon,  1959,  equation  24), 

But  this  formula  cannot  be  used  in  computing  dimensional 
similarity  across  different  groups  since  there  are  no  raw  rs 
between  variables  in  different  groups.  But  we  do  have  the  oblique 
factor  coefficients  of  the  n  variables  on  dimensions  and  in 
different  groups.  Adjoining  the  matrioes  of  factor  coefficients 
of  the  two  (or  more)  groups,  we  oan  compute  the  index  of  propor¬ 
tionality,  Iir  between  factor  coefficients  of  all  pairs  of  dimen¬ 
sions  within  and  across  groups.  This  index  la  (Burt,  194®  J  Tucker, 
1951;  Wrigley  k  Matthaus,  1955*  Tryon,  1959)* 


vC .  ) 


oblique 

where  and  ryC  are  the  vectors  of  ^f  act  or  coefficients  of  0^ 

and  CjT  a  little  algebra  it  oan  be  shov;i  that  when 


the  definers  of  any  dimension  hare  'raw  correlations  that  are 
perfectly  colllnear  (are  of  rank  1),  then  we  can  so'  7^  for  rc  Q 
within  &  rtroup  usinr,  only  the  value  of  hr  The  equation  ^ 
is  (Tryon,  1962): 


rVj  3 


1  -  xA-P, 


OOS  o. 


Expression  (3)  is  called  eos  $  because  its  magnitude  is  tL* 
cosine  of  the  oentral  angle  between  and  when  these  dimensions 
are  expressed  as  points  on  the  hypersphere  (the  SPAN  diagrams),  such 
as  that  of  Figs.  1  and  2,  that  is,  whether  or  not  they  are  dimensions 
within  a  group  or  aoroes  groups.  The  value  of  cos  £  gives  exactly 


1 


6,~3 


the  value  of  r^c  only  when  (X)  the  matrix  of  correlations  between 
the  definers  of  each  dimension  are  of  rank  1  and  (2)  vtaon  the 
adjoined  vectors  of  their  factor  coefficients  include  as  rows  only 
the  defining  variables  of  the  two  dimensions,  Otherwise,  cos  O  is 
only  an  approximation  to  rr„.  Program  C0MP2  of  the  BC  TRY  System 
computes  cos  0  for  condition  (2), 
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Table  6.1  The  24  Variables  of  the  Holzinger  Problem 


Spatial  Tests 

PI  VIS  ^Visual  Figure  Completions 
F2  CUD  rCubo  Si mi lari ilea 
P  >  FDD  £?aper  Form  Board 
P4  DOB  *Lo?ongo  Shape  Potations 


Verbal  Tests 

75  INF  TOenaral  Information 
Vc  CMP  v  Barograph  ComprohenBion 
77  SNT  vSontrnco  Completion 
V8  WCL  W  >i'o  Classification 
79  WMN  vWord  Meaning  ( Vocabulary) 


Speed  Tests 

310  ADD  "Addition 

511  COD  °Codo  Substitution 

512  CNT  "Counting  Groups  of  Dots 

513  SCC  "straight  or  Curved 

Capitals  Discrimination 

Memory  Tests 

M14  WRG  “word  Recognition 
M15  NRG  “Number  Recognition 
H16  FRG  “Figure  Recognition 
M17  WN  “Object-Number  Recall 
M16  NF  “Number  Figure  Recall. 

M19  JFW  Figure  Word  Recall 


Mathematical -Abi  lity  Tests 


N20  DED 
N21  FDZ 
N22  RSN 
N23  SER 
1124  ART 


Deduction 
Numerical  Putties 
Problem  Reasoning 
Series  Completion 
Woody-McCall  Mixed 
Fundamentals.  Form  I 


f  A  definer  of  F(3pao») 
v  A  definer  of  V(  Verbal) 
3  A  definer  of  S (Speed) 
m  A  definer  of  M(Memory) 


l 


1 
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Table  6.2  similarity  of  the  Four  Basic  Holtinger  Abilities, 
V,  3*  F,  M, Within  and  Between  the  Suburban  and 
Factory  Groups,, 

rcc  is  the  intor-domain  r, "correlation  between 
-==•  oblique  factors"  from  the  correlation  of 
sums  of  rs  a 

Cos  Q  is  the  estimated  ru,.,  from  the  index  of 
proportionality.,  —  h 


Similarity  of 


V  Suburban 

Verbal  Factory 

S  Suburban 

Speed  Factory 


JP,_of__the  factor  coefficients 

cluster  dimensions  within  each  group 

V  S  F 

VjprbcJ, _ SjB.w.d _ S9X®.(3PAO«J— 

rCQ  CosS  rQC  CosS  rQC  CobS 

"unities  ;m  — rnr 

:iji  :!H  Unlt1*0  :II  :II 


v 

Form 


Suburban 


(Space)  FactorJ 

M  Suburban 
Memory  Factory 

Ii  3imilarJ.ty  of 


LZcc 

CosO 

Unities 

«U3 

.43 

.43 

.58 

.58 

.35 

.37 

*46 

.47 

.14 

.34 

Unities 
.53  .51 

o29  .26 


.56  .54 
39  .36 


Unities 

.60  .56 

.27  .26 


_ &2BSEL- 

rcc  Cm* 


°  -I —  ' 

•  14  >14 

.56  .54 

.39  .36 

.60  .56 

.27  .26 


!  Unities 


o  lust  or  dimensions  between  groups  (Cos  6  only) 
- - -  .  - - Suburban _ _ 


!  V  ,  Verbal  .96 

j 

i  S|,  Speed  .46 

Factory  |Pf  porta  (Space). 46 

IM-  Memory  .20 


|M^  Memory  |.2o  _ i  *42  {  *41 

C  Generality  of  each  dimension  (reproducibility  of  rs).° 


Suburban 

Factory 


U«  Reliability  coefficient  (alpha)  of  oluater  soore  on  eaoh  dimension  0 


Suburban 

Factory 


a  Vyg  from  USA 
b  Cose  from  COMF1 


From  CSA 
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Table  6.3  similarity  of  Froquenoy  Patterns  of  Factory 

va  Suburban  Children  on  the  Common  Inclusive 
Typology  In  the  Holzinger  Problem, 


Inclusive 

typology 

a 

Fac 

L*u 

tory 
iburbr 
in  ~ 
group 

va. 

m 

Boys  vs 

Oirla 

f 

each 

1 

1 

%  in 

A.rh  rmnim 

I 

1 

Types 

Froq 

.Z- 

Dosoriptive 

name 

Fact 

Pf 

Sub 

_Pa  , 

r 

Oirla 

Pg _ 

-Diff 

V 

s 

P 

K 

pb 

111 

W 

3b 

l\h 

iZ 

Low  Speed  &  Memory 

*■ 

6  ! 

-2 

8 

2 

6(B) 

){2 

6 

36 

47 

49 

IS 

Low  Verbal  fr  Memory 

5 

0 : 

5(F) 

3 

2 

1 

H3  1 

?i  ! 

43 

50 

48 

Low  Memory 

4 

10 

"6(8] 

10 

5 

5(B) 

m 

9 

50 

J9 

ii- 

40 

Low  Speed  &  Form 

2 

4 

2 

4 

-2 

H5 

13  1 

42 

36 

49 

Low  Verbal  &  Form 

1 

5 

4 

1 

2 

6 

-4(Ql 

H6 

20 

50 

50 

13 

46 

Low  Form 

6 

7 

-1 

5 

8 

:-3(o) 

:r7 

19 

47 

11 

43 

30 

Low  Speed 

3 

10 

-7(3) 

8 

5 

j  3(B) 

llfl 

23 

l£ 

47 

33 

54 

Low  Verbal 

13 

1 

12(F) 

9 

6 

!  3(B) 

119 

36 

51 

51 

48 

51 

Average 

13 

12 

1 

10 

15 

-5(o) 

H10 

22 

& 

50 

53 

47 

Hi  Verbal 

3 

12 

Ell 

7 

8 

-l 

Hll 

23 

47 

51 

52 

Hi  Speed 

11 

4 

EH 

6 

9 

-3(0) 

H12 

14 

& 

^4 

59 

58 

Hi  Verbal  &  Speed 

3 

6 

-*3(31 

3 

6 

-3(0) 

HI  3 

27 

52 

5i 

61 

49 

Hi  Form 

9 

9 

0 

13 

5 

8(B) 

H14 

23 

'52 

49 

51 

jM 

Hi  Memory 

9 

6 

5 

10 

-5(0) 

H15 

8 

57 

63 

54 

67 

Hi  Speed  &  Memory 

3 

3 

mm 

2 

3 

-1 

Unique 

19 

Unique 

7 

6 

II 

7 

6 

l 

N 

301 

Ip 

■ 

I 

_ 

N 

iteration  I-,  of  OTYPE 

L. 


Prom  OTYPE  and  03TAT 


Similarity  of  frequency  patterns  of  Faotory  and  Suburban  children 
Pffl  3  2PfPa/l^7l^7-  6i7/f5ie  ifiio  -  „75 

Similarity  of  frequency  pat t erne  of  Boys  and  dirla 

Pbg  *  66'*/  ¥792  1(786  -  .as 

oxgall  loanee 

Por  m  ■  16  types,  the  mean  proportion  in  them  la  p*  ■  .06,  whence: 

)a.  ■  3U0»f^7a7M~;'i/Tl}  *  30yrf  ,o6)( ,b^)( l/lf6  +  l/lb$~  3. 

a  r  b 
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Tuol©  6*4  Within- -group  Typologies  of  the  Puctory  and 
Suourban  Children  in  the  Holzlngor  Problem. 


Factory  Children 

j  Pro.fi.lt*  level  Descriptive 
TypelFreq).^nd_hftB'f^gdr}ttity,  name 

_ |  _ 

‘  v  i,  ,/  01 !  Low  Verbal, 

F1  f  14  *4 6  81 ;  speed  &  Mom 

F2  ~  42.  6<!  5*’  A  -07.  4  Memory 

F3  14  57  50  57  ^0;  06 | Low  Memory 

j 

F4  18  ; 41.  45  XL  US'  -07  Low  Form 

1*5  5  1 48  54  ii£  44  j  .91  Low  Form 

P6  |  9  53  46  45  46 1  oQ9 

(  1 

F7  I  J-8  42  57  47  5l!  07 

P8  1;  585551551-37 
P9  li  47  66  40  46  c03  Hi  Speed 

F10  j  4  52  70  55  50  ,91  Hi  Speed 

Fll  15  ;40  4«  6£  $l  n11  H1Fop* 

5  aaH53.?5  H3pe25bfpor, 
713  j  23  43  50  53  59  082  I 

?14  9  52  51  4.8  66  .  79  Hi  Memory 

Uniq  !  0  I 


N  j.'.  5  b 

Bte.  85  084 .88 


-  'Uf'4r.  "  J».« 


Iteration  1^,  0 1  OTYPE 


Suburban  Children 

Profile  level  Descriptive 
Type  Fr  eq  .and.  v.£nQ<;infti1iX  name 

'VS  P  Ml  ff 

51  e  si  35  48  36  ,86  * 

52  13  48  51  43  37  .92  Low  Memory 


53  14  50  43 

54  17  45  36 


S6  21 


50  43  36  45  .87  Low  Form 
45  36  46  49  <-86  Low  Speed 
^6  40  44  50  -  94  Low  Verbal 

51  51  49  49  .93 


S7  16  66  51  53  4*  .92  Hi  Verbal 


1 36  10 


310  15 

Sll  12 


52  62  49  53  .86  HI  Speed 
66  a  59  57  .35  “  gSj4 
52  49  63  50  ,8t  Hi  Form 
56  53  52  6j^  .85  'll  M«aory 
68  50  62  69  .94 


i _ 

U7 

T*~ - T~ 


.91  87  .90 


Frot*  WAT 
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Table  6.5  Defining  It  era*  of  the  Seven  Item-Cluatera  of  the  MMPI 


A.  The  four  baalo  Item- cl  asters 

I :  Introversion 

(Full  tform,  26  items,  ro 
relp  .91) 


Items,  rol.  .93;  Short  Form,  let  17  items. 


tif 


377 

180 

86 

52 

292 

138 

-  57 

-371 

171 

*309 

-  79 

-353 

321 

267 

-547 

-479 

317 

304 

_ 2A1_ 

172 

_ =5.2.1  _ 

-264_  -449 

B:  Body  symptoms 

Tf*u11  Form,  33  Items,  rel.  .92;  Short  Poro,  1st  17  itema, 
rel.  .89) 

-2k3  62  47  125  161  -  36  -160  -330 

189  -175  44-68  544  -163  191  -  2 

108  -230  -  55  10  72  -  51  -153  -  18 

_ =19J2 _ 114 _ ?S> _ 23 _ z_J _ -1PJ, _ 26J _ ^S£_ 

8 :  Suspicion  and  mistrust 

TPulT"Tormr  25  items,  rel.  .85;  Short  Form,  1st  17  itema, 
rel.  .63) 

404  436  368  447  406  89  455 

507  136  260  319  270  112 

383  24k  265  71  28k  426 

„ . 35LQ--...-34&- _ 469 _ 5£8_ _ 43.0 _ 3.1.6. _ 

T:  Tens  ion,  worry  and  fears 

(i*ull  Form,  36  itema,  rel.  .92;  Short  Form,  1st  17  itema, 
rel.  .88) 


368 

447 

406 

89 

200 

319 

278 

112 

265 

71 

284 

426 

469 _ 5S§ _ 418 

_ 3.1.6 

555 

238 

43 

448 

338 

439 

158 

3  22 

-131 

431 

506 

-242 

186 

-407 

335 

303 

360 

365 

337 

543 

340 

499 

182 

102 

u 

22 

494 

217 

442 

-152 

166 

...12 

.  .4.73 

_ 388 

_ 251— 

—422. 

Bo  The  three  remaining  "dependent”  item-oluatera 


D:  Depress  ion  and  qpathy 

(Full  Form,  28  itema,  rel.  .94 

relo  .91) 

76  -379  418  414 

-107  487  -  8  396 

236  41  549  61 

_ m _ £59 _ I? _ 4 U _ 

R:  Resentment  and  aggression 

(Full  Perm,  21  items,  rel.  .87 
rel.  .82) 

94  375  536  14J 

336  39  139  1® 

468  301  234  28 

_ ^JL92_ _ 97  -  129,  162 

A:  Autism  and  disruptive  thoughts 
(Full  Forh,  23  i corns ,  rel.  86; 
rel,  ,8l) 


}  Short  Form, 

142 

397  84 
526  357 
361  168 


let  17  it 

-  88 
-  46 
104 


;  Short  Form,  1st  16  it 


$1  m 


Short  Form,  1st  17  it  sms. 


559 

425 

560 

342 

33 

241 

511 

-329 

374 

359 

15 

545 

100 

459 

389 

349 

358 

345 

297 

356 

■ 
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Tuble  6.6  Similarity  of  tho  Seven  MMPI  Item-Cluster  Dimensions 
Within  and  Between  the  Normal  and  Patient  Groups „ 


rrc  lo  the  Inter  dom*< in  r,  "correlation  between 
— —  oblique  factors".  from  the  correlation  of 
sums  of  rs  a 


Cos  9  is  the  estimated  rrr  from  the  Index  of 
proportions1  ity„  -  ^ 


Ps  of  the  factor  coefficients 


A  Similarity  of  item  cluster  dimensions  within  each  /croup  * 

i,b 

T 

Intvov 

B 

Body 

S 

JSuflp.ic 

rcc  Co s* 

D 

Peprj9fl.e_ 

rcc  Coa* 

R 

Repent 

A 

Autism _ 

roc  CoB° 

T 

JLpmXqb- 

rcc  Cos* 

rrr  Cos* 

*cc  CoaC 

rC(J  Coe* 

I  Introversion 

3  Body 

S  Suspicion 

J">  Depression 

_ 

'»2  0I4.6 

>7  c06 

„?6  ,60 

,12  u13 
'  (T\a 
,61  ,52^ 
.56  .43 

.31  ,31 
L.J4  .34 

h  R\T 
.43  .35“ 

.71  .69 
.32  .31 
.08  -37 
k\i 

c47  .46 
.37  .37 
.66  .62 
,66  .65 

.33  08 
.50  49 
.65  61 
.57  .55 

,50  .49 
.63  .60 
.59  .57 
.78  .76 

1  Resentment 

A  Autis'ii 

.37  ,32 
o5l  .45 

.64  .52 
.90  069 

„?6  .69 
.77  ->70 

.76  .65 
.72  .59 

A\E 

.74  ,66 

-•64  .62 

X 

,79  .77 
07  .74 

i1  Tens!.-,* 

.59  ,53 

.72  .53 

,50  .48 

.72  .59 

.71  .63 

.67  ,61 

B  .jinLiiarity  of  it om-c luster  dimensions  between  groups  (Coe  O  only) 


BP 

s 

p 

DP 

6 

IA.  Introversion 

A 

.20 

.22 

•  53 

.42 

.38 

•49 

0 

BN  Body 

.33 

•47 

.32 

.49 

•51 

,50 

SN  Suspicion 

.15 

00 

.28 

.60 

.57 

.46 

If 

n 

DN  Depression 

,56 

.31 

.43 

.61 

.65 

,,58 

-59 

n 

A 

h 

Rn  Resentment 

.33 

,31 

.64 

.48 

,80 

.57 

-62 

An  Autism 

.35 

.42 

.61 

.45 

,60 

.69 

.60 

l'N  Tonsion 

.48 

.41 

•53 

.52 

.70 

.57 

C„ 

Generality  of  eaoh  dimension  (reproducibility  of  rs)  c 

Normals 

Patients 

.38 

.31 

.51 

.19 

X 

.53 

.52 

52 

.42 

.52 

.36 

X 

D. 

Reliability  coefficient 

(alpha)  of 

duster 

score 

on  each 

dimension 

c  I 

Normals 

Patients 

.81 

.90 

■M 

•  §3 

.83 

.80 

.79 

.76 

,79 

:K  | 

rcc  from  CSA 


Cos£  from  CQMP2 


From  CSA 


0 
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Table  6.7  Similarity  of  the  Froquency  Patterns  of  Normals  and 
Patlonto  on  the  Common  Inclusive  Typology  in  the 
MM  PI  Problem. 


Inclusive  typology 

%  in 

each  group 

Differences 

Type 

Preqj 

I 

B  S 

T 

Descriptive 

Nbm  Anx  Schiz 
pn  pjt  p. 

N-A 

N-e 

A-S 

V*' 

‘  ~  1 

M2 

24 

38 

42 

11 

42 

15 

48 

15 

I® 

Extro-Healthy 

E*fF82fis8rtfiJ6d 

-Relaxed 

?0 

39 

1 

1 

6 

3 

1900 

38(N) 

1400 

36(N) 

-5(s ) 
-2 

M3  | 

21 

46 

12 

48 

k2 

Healthy-Relaxod 

16 

1 

7 

15(H) 

t 

9(  N) 

-6(S) 

Ml; 

31 

46 

47 

16 

46 

Trusting 

5 

15 

6 

-10(A) 

-1 

9(A) 

MS 

17 

12 

50 

50 

47 

Extrovert 

3 

5 

10 

-2 

-7(S) 

-5(s) 

Mb 

* 

5o 

5o 

50 

50 

Average 

3 

11 

7 

-0(A) 

-Ms) 

4(a) 

M7 

22 

i 

62 

52 

6k 

Somatic-Tense 

0 

11 

7 

-11(A) 

-7(3) 

4(A) 

M<3 

26 

!  48 

48 

ik 

52 

Suspicious 

12 

3 

14 

9(N) 

-2 

11(3) 

M9 

16 

1  50 

65 

52 

52 

Somatic 

0 

10 

1  j 

'•10(A) 

-1 

9(A) 

M10 

1 4 

>50 

6k 

66 

60 

Somatic -Suspic 
-Tense 

0 

8 

3 

-8(A) 

-3 

5(A) 

Ml 

30 

& 

48 

48 

50 

Introvert 

2 

10 

19 

-8(A) 

-17(3) 

-9(3) 

Mil! 

M13 

MU; 

20 

17 

10 

1  66 

,  ss 

54 

65 

61 

60 

53 

65 

6k 

66 

62 

Intro -Suspic 

In?$£5omatic 

In??B2§omatic 
-Suspic -Tense 

0 

0 

0 

9 

9 

5 

10 

4 

3 

-9(A) 

-9(A) 

-5(A) 

-10(S ) 
-4(s) 

-3 

-1 

5(A) 

2 

UnJ»j 

0 

Unique 

0 

0 

0 

0 

0 

0 

If 

310 

N 

90  150 

70 

Similarity  of  frequency  patterns 
Normals  vs.  Anxiet 


'Zp" 

na  ‘  n*  &' '  -  n  * 

Normals  vs.  Schizo  _ 

Pns  =  6364/53^/17020  =  .41 
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Table  6.6  Within- group  MMPr  Item-cluster  Typologies 
of  the  Normals  and  Patients 
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- 

Patients 

Profile  level 
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Profile  level 
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Freq 

and  homogeneity 
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Type 

H*eq 
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5 
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H 

N1 

8 
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P6 

10 
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4 
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Fig. 6.1  Cluster  structure  of  abilities 
within  the  Suburban  and  Factory  groups 
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Fig.  6.4  Cluster  structure  of  118  IflfPI  items 
within  the  Patient  and  Normal  groups 
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Fig.  6.5  Cluster  structure  of  118  UUPI  items 
across  the  Patient  and  Normal  groups 
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SUM-SQUARED  ERROR  PARTITION 

by  Geoffrey  H.  Ball 
Senior  Research  Engineer 

Stanford  Research  Institute 
Menlo  Park,  California 

I  INTRODUCTION 

Two  difficult  problems  associated  with  cluster-seeking  techniques 
are  the  comparison  of  cluster-seeking  techniques  and  problems  in  inter¬ 
preting  the  results  of  running  cluster-seeking  techniques  on  data. 

In  this  paper,  two  techniques  for  finding  minimum  squared-error 
clusters  are  described  and  two  recent  results  related  to  these  techniques 
are  discussed.  Some  sets  of  data  for  examination  and  evaluation  of 
cluster-seeking  techniques  are  given  and  the  two  techniques --ISODATA  and 
the  Slngleton-Kautz  algorithm — are  compared.  In  addition,  some  useful 
graphical  presentations  for  showing  the  structure  of  a  body  of  data  are 
presented.  Methods  that  we  have  found  helpful  for  interpreting  experi¬ 
mental  results  are  also  discussed. 


I 
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II 


TECHNIQUES  FOR  FINDING  MINIMUM  SQUARED  ERROR  CLUSTERS 


Before  describing  the  Singleton-Kautz  algorithm  and  the  ISODATA 
algorithm,  we  discuss  briefly  the  partitioning  of  a  data  set  into  minimum 
squared  error  (MSE)  clusters.  By  partitioning  we  mean  the  assignment  of 
each  data  point  to  one  and  only  one  of  k  subsets.  In  MSE  partitions  we  wish 
to  find  the  assignments  of  the  data  points  to  the  k  subsets  or  clusters 
that  minimize  the  squared  error.  This  squared  error  consists  of  the  sum 
of  the  distances,  taken  over  all  the  data  points,  from  the  data  point  to 
the  point  that  lies  at  the  mean  of  the  cluster  to  which  the  data  point 
is  assigned. 

A  convenient  representation  of  squared  error  is  by  using  the  sum  of 
the  products  matrices  T,  W,  and  B.*  The  within-sum-of-products  matrix  W 
is  a  constant  multiple  of  the  pooled  covariance  matrix  of  data  points. 

It  is  obtained  by  subtracting  its  associated  cluster  average  point  from 
each  data  sample  and  then  calculating  N  times  the  covariance  matrix  for 
this  reduced  set  of  points,  where  N  is  the  number  of  data  samples.  The 
between-sum-of-products  matrix  B  gives  the  amount  and  direction  of  the 
deviation  of  the  cluster  centers  from  the  overall  mean,  weighted  by  the 

number  of  data  points  in  each  cluster.  The  sum  T  =  (W  +  B)  is  a  constant 

*  * 

matrix  independent  of  the  partitioning  of  the  data  points. 


Formally , 

W  =  [w  ]  where 
*  J 

N  N  N 

Q  g  g  g 

Wij"  g=l  £=1  (Xgik  Ng  ^=1  Xgi-t)  <XgJk  N^  ^  Xgjt) 

th  th  ,  .  .th 

where  x  .  is  from  the  g  group  and  is  the  J  component  of  the  -C 

gj-v 

data  point,  and  B  =  Lb  ]  where 

N  N  N 

G  G  Gm  g  .  G  m 

b  =  Z  N  L  x  -  A  Z  Z  x  d  L  x  -  i  Z  Z  x  ). 

iJ  8=1  g  Ng  t=l  gU  N  m-1  Ul  mi4  Ng  t=l  N  m=1  U1 

Other  symbols  used  are: 

G,  the  number  of  clusters,  N  the  number  of  data  points  in  the  g 

g 

G 

cluster,  and  N  =  Z  ,  N  ,  the  total  number  of  data  points. 
g=l  g 


*  * 


See  Friedman  and  Rubin  (1966)  for  a  more  detailed  discussion  of  these 
matrices . 
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The  eigenvalues  and  corresponding  eigenvectors  of  W  B  play  a  central 
role  in  discriminant  analysis.  Any  function  of  the  eigenvalues  and  the 
corresponding  eigenvectors  of  W“*B  is  invariant  under  linear  transforma¬ 
tions  of  the  data.  Useful  functions  of  these  eigenvalues  are:  the 
product  of  the  eigenvalue^,  the  maximum  eigenvalue,  and  the  sum  of  the 
eigenvalues . 

Another  simple  function  of  these  matrices  is  the  sum  of  the  diagonal 
elements.  This  can  be  represented  symbolically  by  using  the  "trace" 
operator,  which  is  a  linear  operator.  From  T  -  W  +  B  we  get  trace 
T  -  trace  W  +  trace  B.  The  trace  of  a  matrix  also  is  the  sum  of  the 
eigenvalues  of  that  matrix. 

From  this  equation  we  can  see  that  for  a  given  set  of  data,  minimiza¬ 
tion  of  trace  W  results  in  the  maximization  of  trace  B.  Trace  T,  as  we 
have  previously  noted,  is  constant  for  a  fixed  data  set  with  respect  to 
modifications  of  the  partitioning  of  the  data  set.  Note  that  trace  T 
is  not  invariant  with  respect  to  linear  transformations.  The  MSE  parti¬ 
tion  of  a  data  set  is  the  partition  of  the  data  set  that  minimizes  trace 
W. 


Another  important  function  of  the  W  matrix  is  the  Mahalanobis  type  of 
distance,  which  can  be  written  as 

1  (x-y)W~ 1 (x-y) ' , 

N 

where  x  is  one  point  of  the  data  set  and  y  is  another  point.  This 
distance  is  also  invariant  with  respect  to  linear  transformations  of  the 
data  set.  It  is  not  invariant,  of  course,  to  new  groupings  of  the  data 
since,  in  general,  this  changes  W“*.  The  matrix  W  can  be  viewed  as  a 
linear  t rans format  ion  of  the  original  data,  since  the  Mahalanobis  type 
distance  between  x  and  y  can  be  rewritten  as 

A  _A  _A  _A 

(u  -  v)  (u  -  v)  •  =  (xW  2  -  yW  z)  (xW  2  -  yW  2)  ’  . 

Note  that  this  is  the  Euclidean  distance  between  u  and  v,  where  u  and  v 
are  obtained  by  linear  transformation  by  W~*  from  x  and  y,  respectively. 

The  difficulty  lies  in  the  necessity  to  compute  W_*  for  each  partition 
of  the  data  to  be  evaluated  in  the  minimization,  since  W  changes  when 
the  partition  changes.  We  discuss  below  this  minimization  of  the  sum  of 
the  Mahalanobis  type  distances  while  simultaneously  changing  W  as  the 
partition  is  altered. 

Or.  Richard  Singleton  has  shown  that  for  a  MSE  partition  it  is 
necessary  (but  not  sufficient)  that  the  hyperplane  that  is  the  perpendicular 
bisector  of  the  line  connecting  any  two  cluster  means  cannot  intersect  the 
convex  hulls  of  those  two  clusters.  (This  requires  that  the  convex  hull* 
of  one  cluster  not  intersect  the  convex  hull  of  another  cluster.) 


*  The  minimum  volume  convex  body  sufficient  to  contain  all  of  the  data 
points  in  one  cluster.  If  the  volume  is  zero  (i.e.,  the  data  is  linearly 
dependent),  then  minimize  the  volume  in  the  linear  subspace  of  highest 
dimensionality  in  which  the  volume  is  non-zero. 


It  follows  from  this  condition  that  a  partition  can  be  a  stable  MSE 
partition  only  if  the  means  of  the  respective  clusters  are  such  that  the 
above  condition  is  satisfied.  As  we  describe  below,  the  ISODATA  procedure 
uses  this  condition  to  seek  an  MSE  partition  by  reassigning  patterns  that 
do  not  meet  that  condition  to  that  cluster  having  the  closest  cluster 
center.  The  use  of  the  perpendicular  bisector  can  be  generalized  to 
distances  measured  using  the  Mahalanobis  type  distance. 

The  Singleton-Kautz  Algorithm 


Hie  Singleton-Kautz  algorithm  was  developed  by  Dr.  Richard  K. 
Singleton  and  Dr  William  Kautz  of  Stanford  Research  Institute  in  1965. 
This  algorithm  seeks  explicitly  to  minimize  trace  W.  The  algorithm  uses 
the  following  steps  to  perform  this  minimization. 


(1)  All  data  points  are  assigned  to  a  single  partition. 

(2)  The  data  point  farthest  from  the  single  cluster  mean  is 
assigned  to  a  second  cluster. 

(3)  All  data  points  are  sequentially  tested  to  determine  if  a 
reassignment  to  the  second  cluster  will  reduce  the  sum-squared 
error  (SSE) .  (Fortunately  the  computation  requires  only  the 
evaluation  of  the  change  resulting  from  the  reassignment.)** 


(4)  When  it  is  no  longer  possible  to  reduce  the  SSE  by  reassigning 
any  single  data  point  to  a  different  cluster,  then  the  number 
of  clusters  is  increased  by  one  and  the  process  is  repeated. 

For  data  sets  of  200  points  experience  indicates  that  about 
four  cycles  through  the  data  point  lead  to  the  situation  in 
which  no  single  data  point  reassignment  will  result  in  a  reduc¬ 
tion  of  the  SSE.  For  larger  numbers  of  data  points  the  number 
of  cycles  may  increase  considerably.  (See  discussion  of  this 
point  in  Sec.  V,  below.) 


^or  similar  techniques  see  also  Forgy  (1966)  and  Friedman  &  Rubin  (1966). 
The  quantity  calculated  for  the  cluster  to  which  data  point  x^  is  assigned 


is 


N  D 

— ! L-  L 

V  1 


,  G  .  2  th 

(x  -  S  /  )  where  N  is  the  number  ol  data  points  the  G 

i  i  Nq  G 


G  th  th 

cluster;  S^  is  the  sum  of  the  1  coordinate  of  all  points  in  the  G 

cluster  excluding  x^.  This  quantity  is  compared  with 


D 


V  1  ‘-l 


H  2 

(x^-  S^/  )  for  clusters  H  and  the  data  point  moved 

H 


if  the  former  exceeds  the  latter. 
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(5)  The  number  of  clusters  is  increased  to  some  maximum  number  of 
clusters.  This  maximum  number  is  set  by  the  person  using  the 
program. 

(6)  When  the  limit  is  reached  then  the  cycle  is  reversed  and  the 
number  of  clusters  is  changed  by  reducing  the  number  of  clusters 
by  combining  those  two  clusters  that  minimally  increase  the  SSE. 
After  combining  those  two  clusters,  the  cycle  described  above 

is  then  used  to  attempt  to  further  decrease  the  SSE.  If  the 
SSE  found  on  this  stage  through  the  cycle  is  smaller  than  the 
SSE  found  in  any  previous  stage,  then  the  partition  obtained 
on  this  new  clustering  is  substituted  for  the  partitioning 
found  the  previous  time. 

(7)  This  increasing  and  decreasing  of  the  number  of  clusters  is 
continued  until  it  is  not  possible  to  reduce  the  SSE  further. 

At  this  point  the  process  terminates. 

Critical  steps  in  this  process  arc  the  selection  of  th£  data  point 
used  to  initiate  a  new  cluster,  the  ordering  of  data  points,  and  the 
choice  of  those  two  clusters  that  arc  to  become  combined  when  the  number 
of  clusters  is  reduced.  These  comments  can  be  summarized  by  saying  that 
the  choice  of  the  starting  points  for  the  iterative  hill-climbing  to  an 
MSE  partition  determines  whether  the  partition  obtained  is  the  minimum 
among  all  local  minimum  squared  error  partitions  of  the  data  set. 

The  ISODATA  Algorithm 


The  name  ISODATA  (see  Ball  and  Hall,  1965,  or  Ball  and  Hall,  1966) 
applies  to  a  variety  of  similar  cluster-seeking  techniques.**  The  defining 
characteristics  of  these  techniques  are: 

(1)  the  iterative  nature  of  the  algorithm 

(2)  the  partitioning  of  all  the  da»a  points  into  subsets  without 
changing  the  cluster  averages,  such  that  data  points  are 
assigned  to  the  closest  previously  obtained  cluster  average, 


(5)  the  combining  of  pairs  of  clusters  into  a  single  cluster, 
(1)  the  splitting  of  single  clusters  into  a  pair  of  clusters. 


* 

These  data  points  could  be,  but  at  present  are  not,  randomly  reordered 
after  each  sequence  of  evaluations  in  order  to  reduce  any  sequential 
effects  of  taking  the  patterns  one  at  a  time  in  a  particular  order. 

See  also  Sebestvcn  and  Edic  (196-1),  Sebestyen(1966)  ,  MacQueen  (1966), 
and  Stark  (1962)  for  similar  techniques. 


I 


I 


Figure  7.1  shows  a  pictorial  flow  diagram  of 
are  sorted,  one  by  one,  on  the  basis  of  a  measure 
set  of  initial  cluster  points.  Each  pattern  goes 
the  cluster  point  to  which  it  is  closest. 


ISOMTA.  The  patterns 
of  distance  from  a 
into  that  subset  having 


After  all  patterns  have  been  sorted  into  one  of  the  clusters  the 
average  of  each  of  these  subsets  of  pat  terns  is  computed  and  for  each 
subset  the  standard  deviation  in  each  dimension  are  determined.  Hiese 
vr.lues  are  then  passed  into  the  Cluster  Information  Hopper. 


The  individual  sample  points  in  small  clustcrs(those  with  fewer  than 
9^  elements  are  considered  small)  are  removed  from  the  data  set,  and  set 
aside  for  special  examination.  Splitting  or  lumping  of  clusters  takes 
place  next.  Splitting  takes  place  if  the  conditions  described  below  are 
met.  Lumping  occurs  between  the  NCLST  closest  pairs  of  cluster  centers 
that  are  less  than  0^,  apart  where  NCLST  is  a  control  parameter.  The 
process  control  parameters,  NCLST.  0  and  0^,  as  well  as  others,  are 
supplied  by  the  data  analyst. 


After  each  lumping  of  splitting,  the  updated  set  of  average  points 
is  used  as  the  set  of  cluster  points  for  the  next  iteration.  Several 
statistics  of  the  data  structure  are  calculated  and  printed  out. 


The  partitioning  can  be  and  has  been  done  with  respect  to  a  variety 
of  measures  of  similarities  of  data  points  to  cluster  averages.  The 
measures  of  similarity  used  thus  far  are: 

(1)  Normalized  dot  products  between  data  points  {xl  and  cluster 
averages  I’m],  where  the  normalization  is  with  respect  to  the 
magnitudes  ot  the  means  and  the  (Jata  points.  This  can  be 
expressed  as  (x  •  m)/  I  I  |  |  •  |  | (m)  |  |  =  cos  (<x ,  m) . 

(2)  The  dot  product  between  the  data  point  and  the  cluster  averages. 
This  can  be  written  as  x  •  m  =  |  |  x  |  |  '  |  |  m|  |  cos  (<x  , in)  . 

(3)  Euclidean  distance  squared.  This  can  be  written  as 
|  |  x  —  m  |  J  =  x  •  x  -  2x  •  m  ♦  m  •  m  =  (x  -  m)  (x  -  m)  ' 

(1)  Mahalanobis  distance,  which  includes  Euclidean  distance  as  a 
special  case,  which  can  be  written  as  (x  -  m)  W  (x  -  m) ' , 
where  W  is  the  pooled  covariance  matrix,  or  the  sum-of-products- 
w i thin  mat r ix . 


As  would  be  expected,  these  different  measures  of  similarity  result 
in  different  clusterings  of  a  given  set  of  data  points.  As  can  be  seen 
from  the  various  equations  describing  these  measures  of  similarities, 
there  is  also  considerable  similarity  between  them.  The  normalized  dot 
product  measures  listances  only  in  angles  between  vectors.  For  this 
reason  it  is  quite  sensitive  to  the  selection  of  the  origin  with  respect 
to  which  these  angles  are  to  be  measured.  The  dot  product  is  not  only 
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sensitive  to  the  select  um  o(  origin,  but  it  is  also  sensitive  to  the 
magnitude  of  the  vectoi  data  points  and  cluster  averages  as  veil.  The 
Euclidean  distance  doc  not  depend  on  the  choice  of  origin  and  can  be 
viewed  as  using  an  additive  normalization  of  the  dot  product  measure  of 
similarity  to  m.  ke  it  independent  of  the  origin.  Euclidean  distance  is 
invariant  with  respect  to  orthogonal  transformations  (i.c.,  rotations) 
of  the  data.  The  Mahalanobis  distance  is  sensitive  to  the  covariance 
of  the  data  points  around  the  various  cluster  centers  and  is  invariant 
with  respect  to  linear  transformations  of  tie  data,  as  well  as  invariant 
to  the  position  of  the  origin. 

The  dl vis  1  on  of  a  single  cluster  into  two  clusters  in  ISODATA,  which 
we  call  splitting,  involves  first  the  evaluation  of  the  desirability  of 
dividing  the  cluster  into  two  clusters,  and  secondly,  a  procedure  for 
doing  this  splitting.  In  the  original  ISODATA  algorithm,  splitting  was 
performed  b>  setting  an  arbitrary  process  control  parameter  ©  and  then 
evaluating  each  cluster  on  the  basis  of  whether  the  maximum  standard 
deviation  along  any  of  the  dimensions  for  each  of  the  clusters  exceeded 
©  .  If  ©^  was  exceeded,  the  cluster  was  split.  Certain  problems  result 
when  this  is  done.  In  particular,  it  is  possible  to  select  the  value  of 
©^  such  that  a  cluster  is  split  and  then  at  a  later  time  the  two  result¬ 
ing  clusters  recombined  because  the  distance  between  the  means  of  the 
two  resulting  clusters  was  too  small  relative  to  the  value  of  the  para¬ 
meter  ©£  that  controls  when  two  clusters  are  to  be  recombined.  The 
dependence  of  the  evaluation  only  on  one  dimension  was  also  felt  to  be 
1  na  deq  ua  t  e . 

A  no*  procedure  now  programmed  with  the  ISODATA  algorithm  performs 
a  trial  splitting  for  each  of  the  clusters.  This  new  splitting  criterion 
functions  as  follows; 

(1)  Find  that  one  dimension  among  the  original  coordinates  of  the 
data  having  the  largest  standard  deviation  about  the  mean  of 
the  cluster. 

(2)  Sort  the  data  Into  two  subsots--a  subset  consisting  of  all 
patterns  having  a  value  larger  than  the  mean  in  that  one 
coordinate,  and  a  subset  consisting  of  all  ixitterns  having 
values  smaller  than  the  mean  in  that  one  coordinate.  (Note 
that  a  comparison  of  one  component  of  the  data  vector  with  the 
threshold  is  all  that  is  required  for  this  step.) 

(3)  Find  the  means  of  these  two  subsets. 

(1)  Use  the  magnitude  of  the  vector  difference  between  these  two 
means  as  an  approximation  to  the  distance  that  would  exist 
between  the  two  cluster  centers  resulting  from  the  split.  (It 
is  an  approximation  because  the  effect  of  the  patterns  in  the 
other  clusters  1  not  taken  into  account.) 

(5)  Compare  this  magnitude  with  the  threshold  (1.1)©  and  split 
the  cluster  if  that  threshold  is  exceeded.  The  threshold  ©^ 
is  the  parameter  that  determines  when  two  clusters  are  to  be 
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combined  into  a  single  cluster  (lumping).  Tiie  advantage  of 
the  new  splitting  criterion  is  that  it  is  now  a  global  splitting 
criterion  in  the  sense  that  one  measures  the  distance  between 
the  new  cluster  means  after  splitting  using  all  of  the  dimensions 
rather  than  just  evaluating  the  cluster  on  the  basis  of  the 
largest  standard  deviation  in  any  one  dimension.  It  has  the 
further  advantage  that  it  will  make  possible,  although  this 
has  not  yet  been  implemented,  the  selection  of  that  cluster 
that  will  maximally  decrease  the  squared  error  when  split. 

This  will  be  useful  if  the  ISODATA  algorithm  is  to  be  used  to 
trace  out  the  curve  of  MSE  versus  the  number  of  clusters,  as 
is  done  in  the  Singlcton-Kautz  algorithm. 

The  recombining  or  lumping  of  two  clusters  depends  on  measuring 
the  Euclidean  distance  between  all  pairs  of  cluster  averages  and  com¬ 
paring  this  distance  with  a  threshold  ©^.  In  the  past,  all  clusters 
having  inter-pair  distances  greater  than  ©  have  been  recombined.  In 
the  future  it  may  be  desirable  to  combine  that  single  pair  of  clusters 
that  minimally  increases  the  squared  error.  This  would  be  simple  to  do 
because  the  sum-squared  error  is  a  function  only  of  the  overall  mean, 
the  two  cluster  means  that  are  being  considered  for  recombination,  and 
the  number  of  patterns  in  each  cluster.  If  this  were  done,  it  would 
result  in  the  complete  elimination  of  the  process  parameters  that  have 
been  used  to  control  the  ISODATA  process.  In  certain  cases  it  seems  that 
removal  of  these  parameters  from  consideration  would  be  useful.  In  other 
situations,  when  we  wish  only  to  use  the  magnitude  of  the  distance  between 
the  cluster  centers  to  determine  the  number  of  clusters,  it  may  be 
desirable  to  retain  ©^. 

Output  From  Computer  Programs 


Given  that  we  have  performed  the  clustering  of  a  body  of  data,  thcr« 
remains  the  question  of  what  particular  fact  about  that  clustering  we 
wish  the  computer  to  print  out  for  our  further  examination.  We  have 
found  that  the  averages  of  the  clusters,  a  list  of  the  datr  points  in 
each  cluster,  the  distances  between  cluster  centers  and  some  statistics 
on  the  w ithin-cluster  spread  versus  the  between-cluster  spread  are 
particularly  helpful. 


Ill  NEW  INFORMATION  RELATING  TO  MSE  CLUSTER- SEEKING  TECHNIQUES 


Two  recent  results  are: 

(1)  That  the  curve  of  the  MSE  versus  the  number  of  clasters  Is  not 
convex  but  that  it  is  ’star-shaped,"  which  is  a  weakened  form 
of  convexity. 

(2)  That  the  ISODATA  algorithm  will  converge  to  a  partition  that  is 
not  a  local  minimum. 

The  Shape  of  the  MSE  Curve 

* 

Dr.  Richard  Singleton  has  been  able  to  show  that  the  curve  dis¬ 
playing  the  MSE  versus  the  number  K  of  clusters  is  not  convex.  The 
counterexample  he  obtained  is  shown  in  Fig.  7.2.  He  has  been  able  to 

show,  however,  that  while  the  curve  is  not  convex  with  respect  to  all 

possible  pairs  of  points,  it  exhibits  convexity  with  respect  to  those 
pairs  of  points  having  as  one  member  of  the  pair  either  K  =  1  or  K  =  N, 

where  N  is  the  number  of  data  points.  This  form  of  weak  convexity  has  been 

described  previously  and  labeled  "star-shaped."  (See  Bruckner  and  Ostrow, 
1962,  for  a  further  discussion  of  star-shapeLness. ) 

It  is  worth  noting  that,  at  least  in  appearance,  the  weakening  of 
the  convexity  of  this  curve  to  star-shaped  form  does  not  appear  to  allow 
the  MSE  vs.  K  curve  to  be  very  non-convex.  In  the  future  we  hope  to 
use  the  s tar-shapedness  of  the  MSE  vs.  K  curve  in  evaluating  an  empiric¬ 
ally  obtained  MSE  vj .  K  curve.  We  would  test  the  s tar-shapedness  of 
the  curve  and  when,  for  a  particular  value  of  K,  the  MSE  (K)  violates 
this  star-shaped  condition,  we  would  attempt  to  find  a  new  partition 
such  that  the  curve  becomes  star-shaped. 

Convergence  to  Non-Locaj.  Minima 


The  representation  of  a  one-dimensional  data  set  as  a  contour-map- 
of-SSE  (©i ,  ©2>  allows  us  to  investigate  the  dynamics  of  a  "simple"  ISODATA 
process  (one  without  splitting  or  lumping).  This  plot  is  shown  in 
Fig.  7.3.  This  representation  gives  the  value  of  the  sum  of  the  squared 
error  as  a  function  of  the  position  of  two  thresholds  placed  along  the 
real  line  for  the  data  snown  in  Fig.  7.4.  In  using  this  representation 
we  use  the  knowledge  that  the  convex  hulls  of  an  MSE  partition  cannot 
intersect.  The  tracks  shown  on  the  contour  plot  show  how  this  "simple" 
ISODATA  algorithm  shifted  thresholds  from  iteration  to  iteration.  In 
"simple"  ISODATA  we  used  cluster  averages  obtained  from  one  iteration  to 
define  the  threshold  positions  for  the  next  iteration,  which  in  turn 
defined  the  cluster  averages  for  that  iteration.  We  see  that  this 
"settling  process"  i.oes  not  always  find  even  a  local  minimum  of  the  sum- 
squared  error  surface  but  that  it  may  (owing,  we  believe,  primarily  to 
the  discreteness  of  the  data)  stop  on  a  "shelf"  in  the  SSE  function 
fairly  remote  from  a  local  minimum  point  of  the  sum  squared  error  surface. 


* 

R.  Singleton,  internal  SRI  memorandum,  June  1966. 
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The  contour  plot  also  Illustrates  the  existence  for  this  data  set 
of  two  minima  of  the  sum-  juared  error  function  for  three  clusters. 
Using  the  plot,  we  have  obtained  examples  having  all  four  combinations 
of  one  or  two  minima  for  two  clusters  and  one  or  two  minima  for  three 
clusters.  At  a  future  time  we  expect  to  use  this  plot  to  help  us 
further  in  examining  the  relationship  between  the  Singleton-Kautz 
algorithm  and  ISODATA.* 


-  ! 

*  The  normal  Singleton-Kautz  algorithm  run  on  this  data  and  it  did 

find  the  MSE  partition.  We  will  run  the  regular  ISODATA  program  on 
this  data  shortly  and  we  will  give  the  results  in  the  final  version 
of  this  paper. 


IV  DATA  SETS 


In  this  section  we  describe  sets  of  data  so  constructed  that  we 
believe  that  they  will  bring  out  the^sensitivities  of  cluster-seeking 
techniques  that  are  applied  to  them.  These  data  sets  can,  we  believe, 
test  the  power  of  cluster-seeking  techniques  to  suggest  structure  in 
data.  These  data  sets  should  also  be  useful  in  interpreting  the  results 
of  clustering,  since  similar  results  on  data  of  known  structure  might 
indicate  a  similarity  in  data  structure  between  this  data  and  data  of 
unknown  structure. 

We  have  designed  data  sets  to  embody  many  conventional  assumptions 
regarding  data.  In  the  first  several  sets  of  data  it  is  most  convenient 
to  describe  these  assumptions  in  statistical  terms. 

Data  Set  1  consists  of  a  mixture  of  normal  distributions  of  varying 
means,  with  each  distribution  having  as  covariance  matrix  the  same 
scalar  multiple  of  the  identity  matrix.  (See  Fig.  7.5) 

Data  Set  2  has  the  same  mean  values  as  Data  Set  1,  with  the 
covariance  matrices  being  the  same  for  all  clusters  but  no  longer 
diagonal.  (See  Fig.  7.6) 

Data  Set  3  uses  the  mean  values  of  Data  Set  1  with  different 
covariance  matrices  for  each  cluster.  (See  Fig.  7.7) 

Data  Sets  *l  and  5  have  characteristics  similar  to  Data  Set  3. 
Variation  in  a  few  dimensions  of  each  cluster  is  low,  but  there  is  very 
high  variation  in  the  other  dimensions.  These  data  sets  are  meant  to 
relate  to  data  in  which  some  measurements  are  very  important  under  some 
conditions  while  other  measurements  are  very  important  under  other  con¬ 
ditions.  Cluster-seeking  techniques  ultimately  should  be  able  to 
isolate  each  underlying  distribution  by  finding  those  dimensions  that 
arc  of  small  variability.  (This  data  can  be  viewed  as  measuring  the 
technique's  ability  to  cluster  data  points  and  variables  simultaneously.) 
(See  Fig.  7.8) 

Data  Set  6  tests  for  the  cluster-seeking  technique's  ability  to 
deal  with  variations  in  the  size  of  clusters  in  different  regions. 

(See  Fig.  7.9) 

Data  Set  7  tests  sensitivity  to  local  variations  in  the  data 
structure.  In  this  data  set  the  small  blob  of  data  points  isolated 
from  the  main  string  by  a  region  of  practically  zero  pattern  density 
is  the  important  feature  of  the  data.  Minimum  squared  error  partitions 
assuming  identical  covariance  matrices  for  all  distributions  will,  in 
general,  not  find  the  small  group  until  the  large  group  has  been  broken 
down  into  many  small  groups.  (See  Fig.  7.10) 

Data  Set  K  tests  sensitivity  for  overlapping  mixtures  of  Gaussian 
distributions.  (See  Fig.  7.11) 

*  At  this  time  only  Data  Sets  1,2,3,7,12,16,18,19, 
set  of  data  points.  The  other  data  sets  will  be 
future. 
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and  20  exist  as  a 
generated  in  the  near 


Data  Set  9  should  be  sensitive  to  cluster-seeking  techniques  that 
look  for  non-linear,  essentially  one-dimensional  data  embedded  in  a 
high-dimensional  space  with  mixed  data  populations.  An  example  of  a 
process  that  might  generate  this  kind  of  data  might  be  data  derived 
from  a  particular  word  in  the  English  language  spoken  many  times  by  each 
of  ten  different  speakers .  If  measurements  are  made  on  this  word  over 
a  number  of  instants  of  time,  the  word  itself  can  be  viewed  as  a  trajectory 
in  some  data  space.  Since  there  is  no  guarantee  that  even  the  same 
speaker  speaking  the  same  word,  will  say  it  in  the  same  way,  particularly 
if  one  attempts  to  vary  the  environmental  conditions  under  which  words 
are  spoken,  it  is  helpful  to  be  able  to  break  apart  words  that  are  spoken 
differently  and  yet  still  be  able  to  combine  words  that  are  spoken  very 
similarly.  (See  Fig.  7.12) 

Data  Set  10  should  be  sensitive  to  those  techniques  that  seek  to 
isolate  patterns  into  clusters,  based  primarily  on  the  absence  of 
patterns  between  clusters  rather  than  on  variability  within  clusters. 

(Sec  Fig.  7  13) 

Data  Set  11  examines  the  sensitivities  of  techniques  to  particular 
kinds  of  constraints  placed  on  the  data.  In  this  case,  the  constraint  is 
that  the  data  all  lie  on  a  spherical  hypcrshell. 

Data  Set  12  consists  of  uniformly  distributed  random  data.  It  pro¬ 
vides  a  good  test  for  the  sensitivity  of  techniques  to  structure  within 
data.  If  it  is  not  easy  to  tell  from  the  output  of  a  program  the 
difference  between  uniformly  random  data  and  the  clustered  data  of  Data 
Set  1,  then  we  would  have  to  assume  that  the  particular  technique  being 
tested  would  probably  be  extremely  difficult  to  interpret  without  further 
information  being  provided  by  the  program.  (Sec  Fig.  7.14) 


Data  Set  13  is  a  collection  of  Gaussian  distributions  whose  means 
lie  in  a  two-dimensional  space  and  with  data  points  in  a  three-dimensional 
space.  The  data  points  have  been  rotated  so  that  they  lie  in  a  three- 
dimensional  subspace  of  a  six-dimensional  space.  This  data  set  provides 
a  means  for  evaluating  our  ability  to  interpret  results  from  high  dimen¬ 
sional  data  when  that  data  can  be  exactly  characterized  in  a  lower  dimen¬ 
sional  Spao., 

Data  Set  14  is  very  similar  to  the  preceding  one,  but  instead  of  a 
simple  rotation  into  a  higher  dimensional  space,  a  non-linear  transfor¬ 
mation  was  used  so  that  linear  techniques  like  principal  components  will 
not  help  much.  (See  Fig.  7.15) 


Data  Set  15  consists  of  data  generated  from  complicated  models  plus 
noise,  in  order  to  see  if  we  can  recover  hints  as  to  the  nature  of  the 
model.  These  data  sets  are  probably  closer  to  those  obtained  from  a 
scientific  experiment  in  which  we  have  only  a  vague  idea  as  to  the 
underlying  processes  and  wish  to  use  the  cluster-seeking  technique  to 
suggest  what  the  underlying  processes  might  be. 


♦ 
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Data  Set  16  consists  of  one-di mens i ona 1  data  mentioned  earl  lor  that  Is 
1 n<»  n  to  have  certain  characteristics  with  respect  to  minimum  squared 
error  (See  Fig.  7  1).  It  has  been  added  so  that  studies  can  be  made  of 
the  dynamic  process  by  which  various  cluster-seeking  techniques  arrive  at 
a  particular  partition  of  the  data. 

Data  Set  17  consists  of  five-dimensional  data  for  which  an  attempt 
has  been  made  to  minimize  the  information  obtained  from  a  marginal  dis¬ 
tribution  along  any  dimension  or  pair  of  dimensions  in  a  scatter  plot  and 
so  situated  that  a  principal  components  analysis  gives  little  information. 
The  data  itself  is  well-clustered  in  the  sense  that  for  each  cluster, 
within  cluster  deviations  are  very  small  with  respect  to  the  distance 
between  a  cluster  and  its  closest  neighbor. 

Data  Set  IK  is  the  historic  Fisher-Kenda 1 1  data  set  describing  four 
measurements  made  on  three  species  oi  Iris.  This  data  is  included  to  aid 
the  comparisons  between  techniques  that  have  been  developed  over  a  con¬ 
siderable  period  of  time,  since  this  data  set  has  been  used  by  a  number 
of  experimenters.  It  is,  however,  a  fairly  simple  set  of  data. 

Data  Set  19  is  a  large  body  of  data  consisting  of  20  measurements  on 
1000  data  points,  provided  by  Dr.  Bernard  Glueck  of  the  Institute  for 
Living  This  data  does  not  have  well-known  structure  and  is  probably 
rather  complicated.  It  is  a  test  not  only  of  our  ability  to  interpret  the 
data,  but  it  also  provides  a  good  evaluation  of  the  technique’s  capabili¬ 
ties  with  respect  to  large  data  sets  of  real  data  with  relatively  high 
numbers  of  dimensions. 

Data  Set  20  consists  of  122  measurements  made  on  97  species  of  bees, 
by  Michener  and  Sokal,  and  has  been  included  to  provide  a  data  set  in 
which  the  number  of  measurements  exceeds  the  number  of  dimensions. 

It  is  hoped  that  these  data  sets  will  provide  a  sufficient  experi¬ 
mental  exercising  of  a  proposed  cluster-seeking  technique  to  provide  a 
reasonably  good  understanding  of  the  capabilities  of  this  technique. 

Due  to  the  large  number  of  data  sets  we  discuss  only  Data  Sets  1, 

2,  and  9  in  comparing  the  Slngleton-Kautz  algorithm  and  IS0Q&TA. 
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COMPARISON  OF  TECHNIQUES 


The  comparison  of  the  two  techniques  is  divided  into  a  section 
dealing  with  verbal  and  graphical  comparisons,  a  second  stating  analyti¬ 
cal  differences  and  similarities,  and  a  third  dealing  with  experimental 
resu Its. 

Assumpt ions .  The  Singleton-Kautz  Algorithm  and  ISODATA  assume  that 
a  disjoint  partition  of  the  data  set  with  relatively  homogeneous  data 
points  being  p '  aced  in  the  same  partition  is  useful.  Homogeneity  is 
measured  by  a  "distance"  to  a  cluster  average.  They  assume  that  the 
particular  distance  measure  that  they  use  is  valid.  Particular  varia¬ 
tions  of  these  techniques  are  obtained  in  the  case  of  the  Singleton- 
Kautz  algorithm  by  modifying  the  criterion  against  which  improvement  in 
the  partitioning  is  measured,  and  in  the  ISODATA  technique  by  modifying 
the  measure  of  similarity,  and  by  modifying  the  procedure  by  which 
clusters  are  split  and  lumped.  Global  or  local  evaluating  criteria  can 
be  used  with  ISODATA  to  further  constrain  the  solution  obtained.  No 
explicit  distributional  assumptions  arc  made  in  either  of  these  techniques. 
However,  it  is  assumed  that  the  distance  measure  or  the  criterion  used 
is  adequate  to  reflect  the  structure  of  the  data  accurately. 

Economies  of  Description  These  cluster-seeking  techniques  describe 
those  situations  most  economically  in  which  Isolated  clusters  of  data 
exist  with  dimensional  variation  that  is  high  in  the  sense  that  the 
covariance  matrix  of  the  means  of  these  clusters  is  of  rank  nearly  equal 
to  that  of  the  data  space.  These  techniques  are  not  particularly  effi¬ 
cient  in  describing  relatively  uniform  random  variability  that  occurs 
within  a  low-order  linear  subspace  of  the  original  data  space.  For 
these  situations  the  factor-analytic  techniques  that  look  at  these  linear 
subspaces  seem  more  appropriate.  They  can,  however,  still  be  used  in 
these  situations  to  provide  empirical  data  categories.  The  cluster¬ 
seeking  techniques  try  to  group  patterns  so  that  the  average  squared 
distance  from  cluster  moans  is  not  significant.  Factor-analytic 
techniques  seek  to  place  the  data  in  a  lower  dimensional  space  and  then 
retain  the  full  variability  of  the  data  within  that  lower  dimensional 
space . 

Limi tat  ions .  These  cluster-seeking  techniques,  when  using  either  a 
criterion  or  a  measure  of  similarity  corresponding  to  Euclidean  distance, 
are  sensitive  to  changes  in  scaling,  although  they  arc  not  sensitive  to 
rotations  of  the  data  or  the  position  of  the  origin.  Changes  in  the  data 
set  that  affect  normalizations  based  on  the  data  sets,  such  as  the 
standard  deviation  about  a  mean,  may  modify  the  clustering  obtained. 

When  the  ISODATA  technique  is  used  with  the  Mahalanobis  distance  it  is 
relatively  insensitive  to  the  scaling  of  the  data.  The  results  of  using 
these  techniques  are  frequently  difficult  to  interpret  because  these 
results  have  a  large  number  of  degrees  of  freedom.  Therefore,  any  simple 
interpretation  usually  could  arise  from  a  great  variety  of  data  sets"! 

Hence  these  techniques,  when  used  to  obtain  too  a  simple  description 
may  provide  little  interpretive  discrimination  between  data  sets. 
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(This  same  crilH  bm  holds  for  most  techniques  based  solely  on  the  co- 
variance  matrix)  Complex  descriptions  that  more  accurately  reflect 
details  of  the  data  are  apt  to  be  confusing. 

Invariances.  The  Si  ngleton-Kaut/.  algorithm  is  invariant  with 
respect  10  orthogonal  trails  format  ions  and  translations  of  the  data.  The 
ISODATA  technique  using  Euclidean  distance  is  also  invariant  with  respect 
to  orthogonal  rotations  of  the  data  and  translations  of  the  origin  of  the 
data  If  the  Mahalunobls  distance  is  used,  then  ISODATA  is  largely 
invariant  with  res|>ect  to  linear  trails  format  ions  as  well  as  with  respect 
to  translations  of  the  data.  These  techniques  both  tend  to  produce 
different  results  if  individual  data  points  are  deleted  from  the  data 
set,  particularly  if  these  data  points  are  "outliers"  or  "wildshots." 

It  Is  well  to  reiterate  that  different  kinds  of  data  may  be  invarl- 
ui  t  with  respect  to  the  clustering  procedure  in  that  the  clustering 
procedure  may  not  be  sensitive  to  the  wavs  that  these  datn  sets  vary. 

If  the  particular  variability  is  important,  then  a  technique  has  to  be 
developed  tlat  is  sensitive  to  this  variation  For  example,  if  scale  is 
important  an  I  there  is  some  natural  way  of  defining  the  scale,  or  where 
there  Is  a  desire  to  weight  certain  variables  more  heavily,  then 
Invariance  with  respect  to  scale  would  not  be  a  desirable  feature  for 
a  technique 

Extensions  to  Different  Problems.  ISODATA  appears  to  be  more 
directly  extendable  to  the  clustering  of  points  around  line  segments  or 
planar  sections.  At  this  time  an  algorithm  is  being  programmed*  to 
cluster  points  around  line  segments.  This  algorithm  uses  the  following 
notions  that  exist  in  the  ISODATA  algorithm. 

(1)  The  creation  of  new  cluster  centers  (the  cluster  centers  are 
now  line  segments). 

(2)  The  evaluation  of  the  usefulness  of  a  given  line  segment. 

(J)  The  iterative  shifting  of  the  line  segment  to  place  it  in 
"be t  ter"  pos 1 1 ion . 

(1)  The  combination  of  those  line  segments  that  can  be  combined 
without  greatly  reducing  the  information  we  lave  regarding 
the  structure  of  the  data. 

When  we  have  completed  programming  this  algorithm  we  iill  investi¬ 
gate  the  desirability  and  the  feasibility  of  clustering  data  around 
triangular  planar  sections.  This  would  enable  us  to  approximate  mix¬ 
tures  of  non-linear  two  parameter  surfaces  that  are  embedded  in  a 
hv perspace . _ 


James  Eusebio  of  SRI  has  done  all  of  the  programming  and  much  of  the 
work  constructing  this  algorithm. 
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Goals.  These  cluster-seeking  techniques  have  as  their  goal  the 
determination  of  structure  in  data.  They  are  sensitive  to  density 
variations  in  the  data  in  the  original  high  dimensional  space.  Both 
techniques  can  be  used  on  any  kind  of  data  (including  nominal  data  that 
is  orrectly  encoded).  Interpretations  must  correspond  to  the  assump¬ 
tions  given  above  that  were  made  in  developing  the  techniques  and 
whether  these  assumptions  are  being  met  by  the  data. 

Analytical  Comparison.  These  techniques  minimize  a  criterion  sub¬ 
ject  to  certain  constraints.  The  S ingleton-Kau tz  algorithm  explicitly 
evaluates  and  minimizes  SSE  as  a  global  criterion  It  is  constrained  in 
this  minimization  to  make  reassignments  of  single  data  points.  It  per¬ 
forms  this  minimization  by  hill  climbing  (or  really,  valley  descending) 
from  a  variety  of  starting  positions  and  then  selecting  for  one  cluster 
up  to  KMAX  clusters  the  lowest  value  of  SSE  found  in  the  various  tries 
as  the  overall  minimum. 

The  ISODATA  technique  tends  toward  implicit  minimization  of  SSE  by 
requiring  that  a  stable  partition  meet  the  necessary  condition  given 
above  Its  settling  procedure  is  not  as  powerful  as  the  single  move 
minimization,  as  can  be  seen  in  Fig.  7  16.  For  the  data  of  this  figure 
either  threshold  satisfies  the  conditions  for  a  stable  ISODATA  partition. 
Only  the  optimum  threshold  satisfies  the  stopping  criterion  of  the 
■  ingle  move  algorithm.  The  ISODATA  technique  is  constrained  to  find  that 
minimum  squared-error  partition  that  keens  the  minimum  distance  between 
the  means  of  all  pairs  of  clusters  greater  than  6C . 

The  computation  time  for  the  "inner  loop”  of  the  Singleton-Kautz 
Algorithm  and  for  the  "inner  loop"  of  the  ISODATA  program  using  Euclidean 
distance  as  its  measure  of  similarity  is  approximately  equal.  The  ques¬ 
tion  of  convergence  per  Iteration  remains  to  be  examined  as  does  the 
effect  of  large  numbers  of  data  samples  and  of  high  dimensionality  of 
the  data. 

Experimentally  on  225  two-dimensional  data  points  we  have  observed 
that  the  Singleton-Kautz  algorithm  finds  a  partition  of  the  data  that 
has  a  SSE  that  is  about  10  per  cent  lower  than  that  of  the  partition 
found  bv  ISODATA.  We  do  have  instances,  however,  when  ISODATA  has  found 
a  lower  SSE  for  this  same  type  of  data.  Perhaps  more  importantly,  for 
many  applications,  we  have  recently  noticed  that  the  Singleton-Kautz  r l- 
gorithm  required  15  iterations  through  1000  six-dimensional  data  samples 
before  it  found  a  single  move  minimum  lor  SSE.  We  plan  to  examine  this 
question  further  by  making  a  detailed  comparison  of  the  iteration  by 
iteration  reduction  by  these  two  algorithms  of  SSE  on  a  variety  o:  data 
sets.  This  has  not  vet  been  done  as  it  requires  some  modification  of  the 
Singleton-Kautz.  program  to  allow  the  two  programs  to  start  from  the  same 
partition  of  the  data.  The  evidence  thus  far  is  that  the  Singleton- 
Kautz  algorithm  generally  finds  a  lower  value  of  SSE  than  does  ISODATA. 

If  a  more  complicated  distance  function  is  used,  such  as  a  sum  of 
the  Mahalanobis  type  distances,  then  the  necessity  for  the  Singleton-Kautz 


algorithm  to  invert  a  matrix  after  each  data  sample  increases  the  compu¬ 
tation  of  its  "inner  loop." 

Since  ISODATA  does  not  change  W*  until  all  of  the  patterns  have 
been  resorted,  ISODATA  should  have  lower  running  times  with  the 
Mahalnnobis  type  of  measure  of  similarity. 

Experimental  Comparisons.  The  experimental  comparisons  described 
in  this  paper  were  confined  to  Data  Sets  1,  2,  and  3.  The  results  are 
summarized  in  Tables  I,  II  and  1 1  land  Figs.  7.17  and  7.18. 

For  Data  Set  1  the  Singleton-Kautz  Algorithm  and  ISODATA  produced 
identical  clusterings  pattern  for  pattern. 

TABLE  I  -  EXPERIMENTAL  RUNS  ON  DATA  SET  1 


S  ingle  ton- 

•  Kau  t  z 

ISODATA 

Data  Means 

1.9 

7  9 

2.8 

7.  C 

6.5 

7,7  and  8,6 

9.  1 

1  .  1 

9,1 

2.9 

2.  1 

3,2 

6.0 

3.0 

<c^  ■  -  I den  t  leal 

6,3 

1.0 

4.3 

to  S ingle ton-Kautz 

1  ,4 

4. 1 

8.9 

4  ,9 

5.  1 

5.  1 

5,5 

SSE  139.69 

139.69 

For  Data  Set  2  the  t  li!~  teri  ngs  were  c|uito  similar,  with  only  two 
out  of  ten  cluster  centers  being  very  different. 


TARIJt  II  -  EXPERIMENTAL  RUNS  ON  DATA  SET  2 


S l ng  let on- 

Kau  t  z 

I  SO  DATA 

Data  Means 

•1  .  8 

9.1 

5.0 

9.5 

4,9 

8.4 

6.9 

9.0,6. 7  and  7 , 

5,7.3 

8,6;  7,7 

9.2 

1 .3 

9.  1 

1.2 

9,1 

9,2.2  and 

6. 7, 3.4 

5.4, 

3.2 

6,3 

2.1, 

1  .  7 

2.5, 

1 . 7 

3,2 

1  7, 

7  .  7 

2.2, 

8.2 

2,8 

3.3, 

■1  .  7 

2.0, 

5.4 

1  , 

3 . 8 

-.4  , 

3.4 

6.  1  , 

5.  7 

6.5, 

5.5 

SSE  296 . 27 


333.26 


For  Data  Set  3  two  cluster  centers  are  markedly  different  in  those 
regions  having  few  data  points. 


TABLE  III  -  EXPERIMENTAL  RUNS  ON  DATA  SET  3 


Singleton- Kau t / _ ISODATA _ Data  Means 


1.6 

8.4 

1.6 

8.4 

2,8 

5.2 

8.  3 

5.  1 

8.3 

5.  1 

3.5 

5.0 

3.7 

1.4 

3.8 

.  75 

3.5 

1  .4 

11.7 

.  7 

11.9 

.  5 

8.2 

5.9 

8.  1 

e.o 

8,6 

7.2 

2.0 

7.5 

2.2 

2.3 

5.2 

3.4 

5.4 

2.  1 

.  1 

4 . 5 

.  85 

- .  8 

2.  1 

1  .  1 

-.2 

SSE 

380.55 

411.  19 

The  Singleton-Kautz  algorithm  (SKA)  found  a  lower-valued  SSE 
partition  for  Data  Sets  2  and  3  than  ISODATA.  The  positions  of  the 
clusters  were  almost  the  same  In  most  instances.  ISODATA  quickly  got 
a  reasonably  good  partition  for  these  data  sets  but  was  very  slow  in 
improving  it.  SKA  found  a  reasonable  partition  almost  as  rapidly  for 
these  data  sets  and  improved  it  rapidly.  SKA  is  considerably  easier 
to  run,  since  it  systematically  provides  values  for  minimum  SSE  for  all 
numbers  of  clusters  up  to  KMAX.  However,  in  runs  on  other  data  that 
had  a  considerable  number  of  wildshots  and  was  of  higher  dimension, 
ISODATA  proved  to  be  easier  to  interpret  and  run  since  it  was  not  as 
affected  by  the  wildshots.  That  is,  ISODATA  rapidly  increased  the 
number  of  clusters  until  the  wildshots  were  isolated.  In  this  latter 
application,  ISODATA  was  more  effective. 

The  statistics  of  the  data  that  the  two  programs  provide  as  output 
are  almost  identical. 
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VI 


INTERPRETING  EXPERIMENTAL  RESULTS 


In  this  section  we  discuss 

(1)  \n  analytical  technique  for  examining  the  amount  of  structure 
in  data. 

(2)  The  wavs  that  graphs  can  usefully  aid  in  interpreting  experi¬ 
ment  a  1  resu Its. 

(3)  An  Interactive  computer  system  for  analyzing  multivariate  data. 

Random  Reshuffling  of  the  Components  of  Data  Points  as  a  Non- 

Parametric  Test  of  Structure  in  Data 

Dr  James  MacQueen  (19G6)  has  suggested  that  the  random  rearrange¬ 
ment  of  the  values  associated  with  each  component  of  the  set  of  data 
vectors  is  a  way  in  which  a  non-paramet r ic  test  of  the  amount  of  struc¬ 
ture  in  the  original  data  can  be  made.  More  precisely,  consider  an 
ordered  set  of  data  points  in  which  each  data  point  is  a  row  in  a  data 
matrix  and  each  variable  has  its  values  in  a  column  in  the  data  matrix. 
First,  the  data  is  clustered  using  the  original  data  points  and  some 
measure--for  example,  SSE--  is  made  of  the  reduction  in  variability 
around  local  cluster  means  resulting  from  the  clustering.  Next,  each 
column  of  the  data  matrix  is  independently,  randomly  rearranged.  This 
causes  the  values  of  each  variable  for  the  data  points  to  be  randomly 
associated  with  the  values  of  other  variables  from  other  data  points, 
which  tests  it  the  specific  associations  found  in  the  data  are  important. 
The  effect  of  this  is  to  create  a  more  or  less  uniform  distribution  of 
data  points  within  the  rectangular  hyper-parallelepiped  that  contains 
all  of  the  data  points.  If  the  disorganization  (measured  by  SSE)  increases 
perceptibly  on  repealed  trials  of  clustering  of  the  reshuffled  data,  then 
it  can  be  said  that  statistically  the  original  data  was  more  structured 
than  would  be  expected  on  the  basis  of  chance.  In  other  words,  if  the 
value  of  SSE  for  the  original  data  is  at  the  extreme  lower  end  of  all  of 
the  sample  values  ol  MSE  obtained  by  this  random  reorganization  of  the 
data,  one  could  say,  with  some  statistical  confidence,  that  the  original 
data  was  structured. 

This  seems  an  exceptionally  important  concept  in  the  evaluation  of 
the  results  of  cluster-seeking  techniques.  Its  primary  disadvantage  lies 
in  the  recomputation  required,  since  first  you  must  randomize  the  data 
points  and  second  you  recluster  ail  of  the  randomized  data.  As  t  e  cost 
of  computer  analysis  is  further  reduced,  this  disadvantage  is  reduced, 
particularly  since  an  interpretation  that  the  data  has  structure  may  be 
greatly  strengthened  bv  this  test. 

Graph u  a  1  Preset) tat  ions 


In  using  graphs  to  aid  in  the  analysis  of  data,  several  points 
seem  particularly  important  These  graphs  should  include  presentation 
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of  residuals,  and  means  of  limiting  the  subset  of  variables  and/or  data 
points  plotted,  or  allow  use  of  special  symbols.  Five  such  graphs  are 
described*  in  the  following; 

(1)  In  any  projection  of  a  set  of  data  points  down  onto  a  lower 

dimensional  space  it  seems  important  to  know  the  amount  of  the 
residual  variation  remaining  that  is  not  shown  in  the  plot 
itself.  Such  a  plot  is  shown  in  Fig.  7.19,  in  which  the  data 
is  plotted  with  respect  to  a  line  in  the  hyperspace.  Points 
arc  plotted  with  respect  to  two  coordinates,  one  their  distance 
along  the  line  in  terms  of  the  perpendicular  projection  down 
onto  the  line,  and,  secondly,  the  distance  they  lie  from  the 
line  measured  perpendicularly. 

(.'■)  We  can  project  high  dimensional  data  down  onto  an  arbitrary 
plane.  The  distance  perpendicular  to  that  plane,  i.e.,  the 
residual  variation,  is  indicated  in  the  plot  by  the  size  of 
the  symbol.  (See  Fig.  7.20.) 

(3)  The  "metroglyph"  suggested  by  Edgar  Anderson  (1957)  shows 

either  a  small,  medium,  or  large  amount  of  residual  for  three 
additional  variables.  This  symbolology  can  be  grasped  quite 
quickly  by  eye.  A  metroglyph  is  shown  in  Fig.  7.21. 

(*1)  We  can  project  data  points  down  onto  an  arbitrary  plane  without 
indicating  the  residuals.  If  this  is  done,  however,  it  seems 
important  to  position  this  plane  or  the  line  meaningfully  and 
to  restrict  those  points  that  are  plotted  on  this  graph  to 
those  having  small  residual  variation.  (See  Fig.  7.22.) 

(5)  The  graph  of  MSE(K)  vs.  K  is  extremely  useful  for  MSE  cluster¬ 
ing  algorithms.  In  Fig.  7.23  we  show  the  difference  in  this 
curve  between  Data  Set  1,  clustered  data,  and  Data  Set  12, 
uniformly  random  data. 

An  Interactive  Computer  System  ior  Analyzing  Multivariate  Data 


In  a  project  presently  underway  at  Stanford  Research  Institute**  we 
are  making  use  of  an  interactive  computer  to  give  us  considerable  con¬ 
venience  in  selecting  and  modifying  the  point  of  view  from  which  we 


♦  For  a  discussion  of  a  wider  variety  of  multivariate  plots  see  Ball  (1967). 

**  David  Hall  of  SRI  and  the  writer  have  done  the  planning  on  this  pro¬ 
ject  together.  Mr.  Hall  and  Dan  Wolf  have  done  the  computer  system 
de->ign  and  the  programming.  The  project  is  supported  by  Air  Force 
Contract  AF  30(602)-‘119C  under  the  technical  cognizance  of  the 
Information  Processing  Branch  of  Rome  Air  Development  Center. 


7.20 


examine  our  data.  Parts  of  the  computer  system  are  shown  in  Fig.  7.24. 

We  will  have  available  data  manipulation  programs  that  would  allow  us  to 
modify  the  scaling  or  the  variables  actually  used  in  describing  a  data 
set,  and  statistical  routines  including  principal  components  and  the  like. 
Eventually,  we  will  have  available  statistical  routines  for  testing 
hypotheses.  We  have  cluster-seeking  techniques  for  finding  good  place 
within  the  data  to  look,  and  a  section  in  which  it  would  be  possible  to 
create  data  artificially  in  order  to  test  a  particular  model  or  to 
generate  data  from  a  model  with  which  experimental  data  can  be  compared. 
Perhaps  most  importantly  we  will  have  a  large  variety  ol  graphical 
presentations  that  will  allow  a  person  to  explore  the  data  points  as 
nearly  as  possible  in  their  proper  perspective  in  the  hyperspace  in 
which  they  lie.  It  is  our  intention  that  we  will  be  able  to  do  this 
with  considerable  convenience.  If  this  occurs,  we  expect  to  be  able  to 
far  surpass  what  the  human  being  is  able  to  do  with  a  series  of  two- 
dimensional  plots,  since  we  will  be  able  to  guide  the  computer  into  those 
positions  that  will  give  us  the  most  "information.” 
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VII  CONCLUSIONS 


For  systematic  analysis  of  relatively  clean  data,  where  the  finding 
of  the  MSE  partition  for  small  numbers  of  clusters  is  a  reasonable  goal, 
the  Singleton-Kautz  algorithm  appears  to  find  partitions  that  have  lower 
values  of  SSE  than  ISODATA.  From  past  experience  with  other  data, 

ISODATA  appears  to  be  superior  for  noisy  data,  where  the  goal  is  quick 
isolation  of  the  principal  modes  of  the  data  with  exclusion  of  outliers. 

The  program  implementing  the  Singleton-Kautz  algorithm  is  easier  to 
use  in  a  batch-processing  computer.  We  feel  that  ISODATA  may  prove 
easier  to  use  in  an  interactive  computer  in  which  the  judgment  of  the 
operator  is  used  in  lumping,  splitting  and  evaluating  clusters. 

The  relative  speed  of  convergence  of  the  two  algorithms  to  an  MSE 
partition  apparently  depends  to  a  greater  degree  than  we  had  expected  on 
number  of  patterns  and  number  of  dimensions.  This  aspect  of  the  compari¬ 
son  must  await  further  experimental  investigation. 

For  finding  partitions  that  minimize  the  sun  of  Mahalanobis  type 
distances  it  appears  at  this  time  that  ISODATA  would  be  computationally 
more  rapid. 

Interpretation  of  the  results  of  these  clusterings  is  by  no  means 
easy.  Several  different  ways  of  presenting  the  data  are  described  and 
an  interactive  display-oriented  computer  system  for  analyzing  multi¬ 
variate  data  is  discussed. 

At  this  time  we  see  the  two  most  important  goals  of  cluster-seeking 
techniques  as  being: 

(1)  To  describe  the  data  as  simply  as  possible,  consonant  with  the 
user's  need  for  accuracy. 

(2)  The  evaluation  of  the  degree  to  which  structure  exists  in  a 
body  of  data. 
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FIG.  7.1  A  PICTORIAL  DESCRIPTION  OF  ISODATA 
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Rotating  Principal  Axes  Into  Approximately  Mutually 
Exclusive  Categories 
Robert  C.  Ryder 
Child  Research  Branch 
National  Institute  of  Mental  Hon  1 1 h 

If  one  considers  the  history  of  personality  research  and  thought  over 
the  past  sixty  years  or  no,  there  seems  to  be  a  general  trend  for  terns  that 
have  originally  designated  categories  to  eventually  become  labels  for  dimensions. 
The  category  customarily  becomes  diminished  in  meaning  t.'  merrlv  designate 
extreme  scoring  esses  on  the  dimension.  Hysterics  become  high  scoring  cases 
on  a  scale  of  hysteria,  neurotics  arc  those  scoring  high  on  neurotirlsm, 
extraverts  score  hlg.ii  In  extraversion,  and  so  on.  At  least  where  category  or 
type  means  what  Cattell  calls  s  homostat,  i.e.,  s  collection  of  objects  with 
similar  attributes,  this  Is  likely  to  be  the  course  of  events  for  niost  typologies. 
Science  tends  to  move  toward  greater  precision.  The  idea  of  category  is 
Intrinsically  binary,  and  hence  usually  involves  throwing  information  away. 
Therefore,  where  eltuatlons  permit,  yes-no  category  concepts  tend  to  drift  toward 
continue,  to  permit  more  precise  measurement  potentialities. 

Even  the  following  progression  is  possible.  An  Investigator  factors  a 
group  of  variables  and  obtains,  say,  seven  dimensions.  Each  subject  is  then 
represented  by  a  profile  of  seven  factor  scores.  The  investigator  uses  the 
factor  scores  as  a  basis  for  clustering  individuals,  and  obtains  14  categories 
of  subjects.  In  later  work  with  this  category  system  he  becomes  dissatisfied 
with  the  crudity  of  simply  labeling  a  subject  as  in  or  out  of  a  category,  ar.d 
so  speaks,  say,  of  the  precise  distance  a  subject  is  from  the  centroid  of  t.ie 
category.  But  a  subject  has  a  distance  from  the  centroid  of  each  category 
so  now  earh  subject  is  represented  by  s  profile  of  14  distances.  Thus  science 
advances. 

While  it  is  convenient  and  useful  to  measure  continue,  there  are  ofren 
situational  and  conceptual  restraints  which  lead  to  retaining  binary  distinctions. 
It  is  not  customary  for  example,  although  perhaps  It  should  be,  to  admit  a 
student  half-way  to  college,  or  to  partly  hire  a  person,  or  to  assign  a  man  to 
a  position  between  two  Job  categories,  or  to  be  semi-married,  or  to  have  a  piece 
of  furniture  that  Is  somewhere  between  a  couch  and  a  desk.  In  personality  work, 
direct  observation  or  conceptualization  may  conflict  with  the  idea  of  a  continuum. 
Many  scores  are  possible  on  most  measures  of  psychotlclsm;  but  a  number  of 
clinicians  continue  to  maintain  that  being  moderately  psychotic  is  like  being 
moderately  pregnant. 

In  general,  for  personality  work,  the  idea  of  a  type  or  a  syndrome  is  useful 
for  those  who  must  try  to  comprehend  individual  esses,  or  who  want  to  put  some 
flesh  and  blood  reality  into  the  psychometricians  abstractions.  This  way  it  is 
possible  to  imagine  how  various  attributes  My  fit  together  as  an  organic  unltv, 
and  to  engage  in  some  meaningful  gestalt  completion  of  a  pattern  that  may  be 
exhibited  only  In  fragments.  Since  Mrs.  Jones  looks  a  bit  like  a  classical 
hysteric  in  the  way  she  presently  acts,  should  we  not  keep  an  eye  out  for  other 
parts  of  the  pattern?  Will  she  also  exhibit  conversion  symptoms,  la  belle 
difference,  or  even  fugue?  A  practical  example  of  the  use  of  the  syndrome  idea 
is  to  be  found  in  Kennedy  (1965).  A  decision  is  made  as  to  whether  a  child  fits 
one  or  another  type  of  school  phobia,  and  then  the  therapist  makes  explicit  and 
direct  use  of  his  educated  guesses  as  to  various,  so  far  unrevealed,  aspects  of 
the  child’s  behavior. 
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There  Is  no  nroblen  in  using  dimensions  for  measurement  purposes  wliile 
continuing  to  think  in  lorn-?  of  categories ,  svndror-es  or  tvpes.  as  long  as  onlv 
one  d  irons  ion  is  considered  at  a  time.  Tvpes  can  simple  refer  to  portions  of 
the  dimension's  range.  lor  example,  a  tvi>e  mlp.ht  he  derived  hv  anv  means 
whatsoever ,  ant.  individuals  scaled  on  a  dimension  of  distance  from  the  centroid 
of  the  tvpe,  or  perhaps  according  to  the  probability  that  they  belonp  in  the 
type.  In  which  case  the  type  label  would  refer  to  low  distance  scores  or  high 
probability  values.  ihere  is,  hnwever,  a  practical  problem  where  a  number  of 
different  dimensions  are  used  simultaneously .  Since  it  is  convenient  to  think 
of  types  as  mutually  exclusive,  it  is  no  lonper  possible  sinnly  to  translate 
tvpc  into  extreme  score.  One  deals  instead  with  profiles  of  scores,  and 
talks  about  a  particular  score  profile  as  being  such  and  such  a  tyne,  with  such 
and  such  a  set  of  miscellaneous  attributes.  However,  witii  even  a  moderate 
number  of  variables,  the  number  of  possible  profiles  becomes  great,  even  if 
the  number  of  popular  profiles  is  not  so  great.  Anart  from  the  number  of 

profiles,  the  situation  is  sloppy  ar.d  inconvenient  fro r  the  point  of  view  of 

a  human  user.  Whv  should  a  svnorome  he  defined  in  terms  of  three  or  five  or 

seven  variables  if  it  can  be  defined  in  terms  of  one?  The  Intent  of  the 

proceuure  to  be  described  here  is  to  reduce  this  slonniness,  to  trv  and  make  the 
case  with  several  variables  similar  to  that  with  one  variable.  That  is,  in 
the  ideal  situation,  no  ratter  how  rtanv  variables  are  used  a  category  remains 
defined  as  an  extreme  score  on  one  and  onlv  one  dimension,  even  as  categories 
remain  mutually  exclusive.  lo  put  it  another  wav,  the  attempt  is  to  juggle 
things  in  such  a  way  that  all,  or  as  many  as  possible,  of  the  score  profiles 
are  simple,  single  spike  profiles. 

Ihe  procedure  is  as  follows.  Take  an  N  observations  hv  T  variables 
data  matrix  M,  standardized  for  convenience  In  such  a  wav  that  f!"'  is  direct ly 
P.  the  l  bv  T  matrix  of  product  moment  correlations  among  variables.  In  general. 


there 


and  >  is  a  ui agonal  matrix 
matrix  of  loadings,  Y  \  . 


X’X  -  T, 

Y  ’  Y  -  £ 

Principal  axes  onoratlons  ate  user  to  obtain  a 
and  one  of  scores,  >. 


i  ustomarv  procedure  -it  this  point  would  either  :>e  to  leave  t  ie  axes 
tin  rot  a  ted ,  or  to  rotat"  V  In  such  a  wav  as  to  violo  simple  structure  .<monr 
variable?.  The  present  sup -cation  is  inst  to  rotati  the  -atri:-  of  factor 
scores ,  in  such  a  way  as  to  vield  simple  structure  among  subjects.  Ideally, 
the  remitting  ratrix,  sav  \A,  shcult  he  one  where  each  su''|ert  has  approxirat«  ly 
zero  scores  on  all  .11  ns  ions  but  one,  i.e.,  *11  profiles  should  ho  sin-l«  spixi 

in  form. 


In  too  examples  to  he  presented,  ortho-nnal  ror.it  lor  t  e-itoycu,  uslc.-. 
normalized  t’uartimax  prucci  tires  ant!  thus  maxi—. I i »n  t  ie  1  i'.  el  i'ioo-1  that  each 
itserv.lt  Ion  will  have  as  few  as  possible  score?-  t'  I  tr.  as  extree  as  possible. 
A  is  therefore  a  square  ortlionormal .  Jrfiop.ip.al  rotation  sc«—s  reasonable  in 
view  of  tiie  intent  of  Inis  scor"  or  jirofllc-  rot  »t_io:i  (i  .1  .  rv<'»-r  l'"’t>4).  i  int 
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is,  there  Is  no  wav  each  sub|e<t  ran  have  no  more  than  one  nonzero  acorn  without 
XA  being  orthogonal.  Since 

(XA) '  (XA)  -  A'X'XA 
-  A '  A 
■  ! 

this  corn)  It  Ion  in  fulfilled  bv  orthogonal  rotntion. 


it  is  possible  for  XA  t<>  he  orthogonal  wliile  the  columns  of  XA  arc  not 
uncorre 1 nted ,  as  when  scores  for  each  dimension  are  either  some  positive  value 
or  zero.  More  categories  are  represented  per  dimension,  however,  if  dimensions 
are  bipolar,  with  positive  score  values,  negative  score  values  and  Jtero  score 
values,  a  situation  that  is  guaranteed  by  the  Indicated  standardization  of  M. 

If 


1'  M  -  O' 
f '  M  Y  O' 

1'  X  -  O' 

1'  XA  -  O' 

and 


is  directly  the  correlation 

then  possibly 

but 


(XA)’  (XA)  -  I 

matr.x  among  profile  rotated  scores. 

1'  M  f  O' 

1'  XA  /  O' 

(XA)'  (XA)  •  I 


If 


8 1 i 1 1  holds. 


The  rotated  factor  matrix  F  corresponding  to  the  profile  rotation 
XA  is  found  by  taking 

F  (XA) '  -  M' 

F  -  M'  XA 
■  Y  X-  X'XA 
-  Y  >.*■  A 

so  that  one  car  obtain  F  by  either  postmultiplying  M'  by  XA  or  if  A  la  known, 
postmultiplyinR  Y  X’by  A.  In  either  case  the  result  is  a  matrix  of  factor 
loadings  so  rotated  as  to  maximize  the  likelihood  of  simple  factor  score  profiles. 


/a  part  of  continuing  research  on  the  first  years  of  marriage  (Raush, 

Goodrich  and  Campbell,  1963;  Goodrich  and  Ryder,  1966;  Ryder  and  Flint,  1966; 

Ryder  and  Goodrich,  1966;  Ryder,  1966),  a  great  deal  of  information  was  gathered 
concerning  a  small  group  of  suburban  middle-class  newlywed  couples:  N  varied 
from  4!  to  49  couples  as  a  function  of  missing  data.  Since  many  more  variables  were 
measured  than  there  were  couples,  an  intensive  effort  was  made  to  reduce  the 
number  of  variables  to  a  manageable  size.  What  we  did  was  tc.  consider  all  the 
variables  from  a  given  technique  or  kind  of  technique,  such  as  interview  codes, 
questionnaires,  or  objective  testing,  and  cluster  them  on  a  more  or  less  ad  hoc 
basis.  The  resulting  clusters  were  then  factored.  The  factors  from  these 
several  techniques  were  then  Jointly  factored  in  a  final  synthesis  factoring 
which  was  hAsed  on  only  13  variables  (the  15  being  previously  extracted 
factors).  This  iterative  factoring  procedure  also  was  intended  to  reduce  the 
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lixt'liiuH'd  of  motiioti  factors,  which  ruKtonnr  i  1  v  appear  when  data  f  rom  several 
sources  .ire  jointly  factored  (i.artwr  i  "h  t ,  rirtner,  and  f'iskc  14-  )  (rrtwririit. 
and  xotli,  10^7;  Horsy  the  and  Fairweather,  Ibnl  Hibson,  Snvdcr .  'am  Hay.  lV'ih 
klchols  am'.  berk.  I'M  l  ). 

Our  factor  data,  and  our  factor  score  data,  thus  include  turro  principal 
conponents  based  on  an  objective  test  cal  le«i  tbe  dolor  ’at  chirp  Test  (C  !T) 
(Huonrich  and  lloomer,  l*lf»3:  t.vder  arid  Hoodrich,  Jhfb  "v!rr,  lOhp),  four  based 
on  i  content  analysis  of  interview  material,  four  base.,  on  ratings  of  interviews, 
ami  four  based  on  the  synthesis  analysis.  There  were  also  four  factors  based 
on  questionnaire  material:  hut  complications  in  their  extraction  am:  coirnosition 
make  it  convenient  to  nvnass  them  in  the  present  discussion. 

factor  scores  for  these  various  analyses  were  compute*!  with  and  without 
profile  rotation  to  pet  at  least  a  rouRh  ieea  of  whether  profile  rotation 
increases  the  number  of  single  spike  profiles.  In  order  to  talk  about  spikes, 
it  was  necessary  to  define  some  convention  as  to  what  was  an  extreme  score. 

It  was  decided  to  use  that  cutting  point  which  would  hypothetically  permit 
perfect  differentiation  between  two  spts  of  scores.  Procedure  was  as  follows: 

1)  Two  sets  of  factor  scores,  from  the  same  principal  comnonents 
analysis,  were  tabulated  in  a  frequency  distribution  of  absolute 
scores,  poolinp  between  analyses  and  amonp  axes. 

2)  The  cuttinp,  point  for  extreme  was  located  so  as  to  leave  (as 
closely  as  possible)  N  extreme  scores  overall,  where  N  was  as 
usual  the  number  of  observations. 

If  all  profiles  were  either  sinple  spike  or  no  spike,  there  could  then 
he  perfect,  i.e.,  all  sinple  sniie,  representation  of  the  scores  usinp  profile 
rotation,  and  no  spiked  profiles  for  the  other  set  of  scores  (or  vice  versa). 

Consider  first  the  analysis  of  content  codes  from  our  interviews  with 
the  newlvwed  couples.  Ten  to  12  hours  of  interviewing  per  couple  were  subjected 
to  a  detailed  content  analysis,  the  data  from  which  were  summarized  in  37  clusters 
of  codings.  Four  principal  components  were  computed  botii  for  the  unrotated 
axes  and  for  profile  rotated  data.  The  frequency  distributions  of  numbers  of 
spikes  are  Riven  in  Table  1  for  unrotated  and  for  profile  rotated  data. 

It  can  be  seen  that  there  was  a  modest  tronc  toward  more  sinple 
spike  profiles  with  profile  rotation.  For  the  content  analysis  data  alone,  a 
check  was  made  on  the  effects  of  conventional  varimax  rotation  of  factor  loadir.es. 
Results  for  the  corresponding  factor  scores,  usinp.  the  sane  cuttinp  point,  are 
r i veil  in  Table  2 . 

Note  that  the  number  of  sinple  spike  profiles  is  the  same  for  unrotated 
data  and  data  rotated  in  the  usual  manner. 
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Tables  1  anti  4  compare  unrotated  and  profile  rotated  scores  for  Interview 
ratings  and  for  the  CMT,  respectively.  Notice  that  for  CJfl  data  the  trend  toward 
more  single  spike  profiles  with  profile  rotation  Is  reduced  to  the  vanishing 
point,  and  that  the  trend  vanishes  entirely  for  interview  rating  data. 

'1  lie  one  remaining  principal  components  analysis  combines  data  from 
these  other  several  sources,  plus  questionnaires,  i.e.,  it  is  the  synthesis 
analysis.  Results  for  this  are  given  in  Table  5,  and  are  a  shade  more 
encourap ing . 

f>n  the  assumption  that  there  might  he  an  interest  in  the  qualitative 
changes  that  might  derive  from  profile  rotation,  factor  loadings  for  the 
synthesis  analysis  are  given  in  Table  6.  These  are  expressed  not  In  terms  of 
the  first  order  factors  that  went  into  the  synthesis  analysis,  but  In  terms  of 
some  of  the  variables  on  which  those  first  order  analyses  were  based.  Variables 
included  in  Table  6  are  those  which  loaded  £  | . 50  \  on  at  least  one  factor 
from  each  rotation.  The  most  striking  difference  between  the  two  sets  of 
factor  loadings  seems  to  be  that  profile  rotation  tends  to  bring  the  evaluative 
variables  together  in  the  same  factor  more  than  is  the  case  for  the  unrotated 
factors . 

The  upshot  of  these  various  analyses  Is  fairly  disappointing.  There  la 
a  slight  tendency  for  profile  rotation  to  increase  the  number  of  single  spike 
profiles,  at  least  for  this  sample:  but  so  slight  as  to  make  it  doubtful  that  a 
reasonable  Increment  in  single  spike  profiles  Is  a  dependable  consequence  of 
profile  rotation.  The  trends  have  seemed  so  slight  as  to  make  it  absurd  to 
try  and  dignify  them  with  inferential  statistics.  On  the  other  hand  these  slight 
trends,  combined  with  anomalies  of  these  analyses  (too  small  a  sample  and  too 
many  variables),  are  enough  to  keep  alive  the  anticipation  that  with  a  larger 
sample  and  a  cleaner  set  of  variables  the  trends  would  prove  to  be  nonchance 
anu  of  a  magnitude  to  make  profile  rotation  worthwhile.  It  should  be  noted  In 
passing  that  the  total  frequencies  of  spikes  are  determined  by  the  procedure 
for  setting  cutting  points  for  extreme  scores.  Juggling  the  cutting  points 
around  to  ad  hoc  optima  could  lead  to  a  much  greater  number  of  single  spike 
profiles,  and  to  a  greater  (or  lesser)  advantage  for  profile  rotation.  At  any 
rate,  data  is  now  being  collected  on  a  far  larger  sample,  anil  there  should  be 
more  definitive  Information  in  due  course. 
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Tabic  1 


Unrotated  and  Profile  Rotated  Factor  Scores 
Baaed  on  Interview  Content  Analysis 


Spikes  per  Profile 

Un rotated 

x 

Quartlmax  Profile  Rotated 

0) 

29 

24 

1) 

18 

23 

2) 

2 

2 

3) 

n 

0 

4) 

0 

0 

Note:  N  »  49,  four  axes  extracted  from  37  variables. 

Table  2 


Scores  corresponding  to  Varlmax  Rotation  of  factor  Loadings 
Based  on  Interview  Content  Analysis 


0) 

1) 

2) 

3) 

4) 


26 

18 

4 

0 

0 


Note:  N  -  48,  four  axes  extracted  from  37  variables. 


Table  3 


Unrotated  and  Profile  Rotated  Factor  Scores 
Based  on  ?1  Rated  Interview  Variables 


Spikes  per  Profile 


f 

Unrotated  Quart lmax  Profile  Rotated 


0) 

1) 

2) 

3) 

4) 


28 

17 

3 

1 

0 


31 

15 

3 

0 

0 


Note:  N  •  49,  four  axes  extracted 


Table  4 


Unrotated  and  Profile 

Rotated  Factor  Scores 

Based  on  17  i 

GMT  Variables 

Spikes  par 

Profile  Unrotated 

f 

Quart lmax  Profile  Rotated 

0) 

30 

29 

1) 

16 

18 

2) 

2 

1 

3) 

0 

0 

Note:  N  -  48,  three  axes  extracted. 
Table  5 


Unrotated  and  Profile  Rotated  Factor  Scores 
Based  on  Synthesis  Analysis 


Spikes  per  Profile 

Un rotated 

Quartlmax  Profile  Rotated 

0) 

31 

27 

1) 

13 

19 

2) 

4 

1 

3) 

0 

1 

4) 

0 

0 

Note:  N  • 

48,  four  axes  extracted  from  15  variables 
that  were  themselves  factors. 

Variables  include  only  those  loading  >  50  ,  on  at 

least  one  factor  of  each  rotation. 
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CLUSTER  ANALYSIS  AND  THE  SEARCH  FOR  STRUCTURE  UNDERLYING 
INDIVIDUAL  DIFl'ERENCES  IN  PSYCHOLOGICAL  PHENOMENA  * 

Ledyard  R  Tucker 
University  of  Illinois 


Research  on  techniques  for  investigation  of  individual  difference*  in  j> u>- 
chologicaj  phenomena  is  related  in  several  ways  to  the  subject  of  this  confer¬ 
ence:  cluster  analysis  of  multivariate  data.  A  first  and  major  relation  is  to 
one  of  the  important  possible  motivations  for  cluster  analysis.  This  relation 
involves  the  general  formulation  of  the  research  project  at  the  University  oi 
Illinois  on  techniques  for  investigation  of  individual  differences  in  psychol 
ical  phenomena.  Further  relations  involve  some  common  technical  problems  and 
solutions. 

Consideration  of  differences  between  individuals  in  psychological  phenom¬ 
ena  has  had  a  long  history  dating  back,  undoubtedly,  to  the  first  thoughts  of 
man  in  description  of  the  behavior  of  other  men.  These  considerations  have  con¬ 
tinued  and  have  entered  the  science  of  psychology  at  various  points  such  as  in 
the  studies  of  the  "personal  equations"  initiated  by  the  astronomer  Bessel,  in 
the  proposal  of  body  types  by  Kretchmer,  in  the  development  of  differential  psy¬ 
chology  as  furthered  by  Galton.  A  variety  of  techniques  have  been  developed 
for  the  study  of  the  structure  of  individual  differences  in  measurable  attri¬ 
butes  of  individuals.  Refinements  and  extensions  of  these  techniques  as  well 
as  the  development  of  newer  techniques  are  in  progress  for  improving  these  stu- 
die  Even  so,  much  of  this  work  does  not  bear  directly  on  some  central  prob 
lem£  in  psychology  and  in  relations  between  variables  for  single  individuals. 

A  first  approximation  in  the  description  of  this  latter  area  is  to  describe  i' 
as  a  jmbination  of  traditional  experimental  psychology  and  differential  psycn- 
ology . 

Cronbach,  in  his  presidential  address  to  the  American  Psychological  Asso 
iation  in  1957,  gave  an  excellent  historical  review  and  discussion  of  the  con¬ 
trast  between  experimental  psychology  and  what  he  called  correlational  psychol¬ 
ogy  which  we  may  identify  as  differential  psychology.  He  refers  to  the  two  dir- 
ciplines  of  psychology  as  "two  historic  streams  of  method,  thought,  and  affil¬ 
iation  which  run  through  the  last  century  of  our  science".  He  further  noted 
that  there  has  been  recognition  of  the  distinctions  between  these  streams  ar.d 
that  statements  of  hopes  to  bring  them  together  have  been  made  since  the  time 
of  Wundt.  For  example,  Cronbach  stated  that  "Dashiell  optimistically  forecasc 
a  confluence  of  these  streams,  but  that  confluence  is  still  in  the  making"  rrd 
"Hull  sought  general  laws  just  as  did  Wundt,  but  he  added  that  organismic  fac¬ 
tors  can  and  must  be  accounted  for.  He  proposed  to  do  this  by  changing  the 
constants  of  his  equation  with  each  individual.  This  is  a  bold  plan,  but  one 
which  has  not  yet  been  implemented  in  even  a  limited  way."  A  further  comment 
by  Cronbach  which  is  quite  relevant  to  this  report  is,  "Tucker,  though,  has  at 
least  drawn  blueprints  of  a  method  for  deriving  Hull's  own  individual  parame¬ 
ters  by  factor  analysis."  I  wish  to  add,  hurriedly,  that  Cronbach  is  only 
partly  correct  in  this  reference  to  my  work,  which  is  not  based  on  the  Hulliar 
learning  curves;  but,  this  work  is  concerned  with  the  development  of  individ¬ 
ual  parameters  as  indicated  by  Cronbach.  Further,  this  work  includes  the  study 

*  This  paper  is  based  on  research  jointly  supported  by  the  University  of  Illi¬ 
nois  and  the  Office  of  Naval  Research  under  contracts  Nonr  1>  34(39)  and  NGOOIM- 
-66-C0010A03  . 
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of  possible  structural  relations  among  parameters  for  individuals  in  a  popu¬ 
lation.  The  confluence  of  experimental  psychology  and  differential  psychol¬ 
ogy  is  realized  by  an  expansion  of  the  concepts  of  psychological  laws  to  in¬ 
volve  individual  parameters  coupled  with  an  extension  of  differential  psych¬ 
ology  to  description  of  individual  differences  in  these  parameters,  and  thus 
in  the  psychological  laws. 

Study  of  parameters  of  a  functional  relation  of  a  dependent  variable  to 
an  independent  variable,  of  which  the  study  of  learning  functions  is  an  exam¬ 
ple,  is  but  one  phase  in  the  search  for  structure  of  individual  differences 
underlying  psychological  phenomena.  Work  has  progressed  on  developments  in 
other  areas  of  this  general  area  of  problems.  Procedures  have  been  investi¬ 
gated  for  description  of  individual  differences  in  psychological  scaling,  both 
unidimensional  scaling  involving  judgments  in  relation  to  named  attributes 
such  as  preference  or  value,  and  multidimensional  scaling.  Closely  related 
procedures  have  been  investigated  for  the  study  of  individual  differences  in 
judgments  of  similarity  between  pairs  of  stimuli,  such  as  adjectives  ussd  to 
describe  personality  attributes.  These  techniques  involve  multivariate  pro¬ 
cedures  closely  related  to  factor  analysis.  A  further  development  is  the  ex¬ 
tension  of  factor  analysis  to  consideration  of  data  characterized  by  three 
modes  of  classification  such  as  by  individuals,  traits  measured,  and  occasion 
of  measurement.  Another  example  of  three  mode  data  would  be  the  extent  of 
reaction  of  individuals  on  several  variables  of  reaction  in  several  stimulus 
situations.  The  relation  of  pattern  of  reaction  over  variables  for  different 
stimulus  conditions  may  be  dependent  on  the  individual  who  may  be  described 
by  a  group  of  parameters.  Study  of  these  individual  parameters  is  involved 
in  the  search  for  structure  of  individual  differences  in  psychological  phen¬ 
omena. 

Some  of  the  issues  involved  in  the  study  of  individual  characteristics 
in  psychological  phenomena  may  be  clarified  by  the  following  four  attributes 
for  description  of  scientific  endeavors  in  psychology. 

A.  Behaviors  studied:  single  or  multiple. 

B.  Measures  obtained:  single  or  multiple. 

C.  Values  of  the  measures  for  individuals  or  within-individuals 

relations  between  measures  for  several  variables. 

D.  Individual  differences:  none,  structured,  chaotic. 

The  behaviors  studied  cover  a  wide  range  of  activities  of  subjects,  both  in 
natural  observational  situations  and  experimental  situations,  such  as  conver¬ 
sation  with  other  individuals,  response  to  particular  stimuli,  performance  on 
a  given  task,  etc.  The  measures  also  cover  a  wide  range  of  possibilities  so 
that  one  or  more  measures  may  be  obtained  from  any  one  behavior.  For  example, 
responses  of  subjects  in  a  word  association  experiment  may  be  measured  by  lat¬ 
ency  of  response,  galvanic  skin  reaction,  and  rareness  of  response  word.  Any 
one  of  these  measures  or  several  of  them  may  be  recorded  for  each  response  of 
a  subject. 

Attribute  C  is  related  to  the  common  contrast  made  between  S-R  and  R-R 
studies,  but  involves  a  basically  distinct  contrast.  Many  studies  involve 
observation  of  the  value  of  a  single  response  measure  from  each  of  a  number 
of  behaviors  for  each  subject  and  then  study  the  relations  between  these  mea¬ 
sures  over  a  group  of  subjects;  these  studies  are  classifiable  as  R-R  studies 
and  are  examples  of  studies  of  the  values  of  measures  for  individuals.  It  is 
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possible  to  obtain  several  response  measures  for  each  behavior  of  a  ‘'abject  in 
some  category  of  behavior,  such  as  a  word  association  experiment,  and  to  r.tuu 
the  relation  between  the  response  measures  over  a  number  of  behaviors  for  the 
same  subject.  These  studies  are  classifiable  as  R-R  studies  but  are  studies 
of  the  within-individual  relations  between  measures.  Many  studies  classifi¬ 
able  as  S-R  studies  involve  within-individual  relations  between  variables:  a 
stimulus  variable  and  a  response  variable.  An  extension  of  this  class  of  stu 
dies  may  be  denoted  as  S-(R^,  R^)  for  which  two  measures  of  response  are  ob¬ 
tained  for  each  value  of  a  stimulus  situation  and  the  relations  studied  of 
these  response  variables  to  the  stimulus  variable  and  to  each  other. 

Attribute  D  concerns  the  focus  of  experiments  and  assumptions  made  con¬ 
cerning  differences  between  individuals  in  the  phenomena  being  studied.  Dif¬ 
ferential  psychological  studies  emphasize  the  individual  differences  and  ten' 
to  assume  a  structure  in  these  differences.  Many  studies  in  experimental  psy¬ 
chology  minimize  the  differences  between  individuals,  assuming  either  that 
there  are  no  such  differences  or  that  the  differences  are  chaotic  and  represc." 
chance  discrepancies  from  general  laws  of  relations. 

Using  these  attributes,  many  studies  in  differential  psychology  could  be 
described  as: 

A.  multiple  behaviors  studied, 

B.  single  measure  obtained  for  each  behavior, 

C.  value  of  each  measure, 

D.  structure  of  individual  differences  in  these  observations. 

In  contrast,  many  studies  in  experimental  psychology  could  be  described  as: 

A.  single  behavior  studied, 

B.  multiple  measures  obtained, 

C.  within-individual  relations, 

D.  assumption  of  no  or  chaotic  individual  differences. 

A  comparison  of  the  search  for  individual  differences  in  psychological  phen¬ 
omena  with  these  two  contrasting  profiles  is  profitable.  This  project  em¬ 
phasizes: 

A.  either  single  or  multiple  behaviors, 

B.  multiple  measures  obtained  for  each  behavior, 

C.  within-individual  r  lations, 

D.  structure  of  individual  differences  in  these  relations. 

This  profile  of  attribute  values  has  some  similarity  to  each  of  the  preceding 
profiles,  but  it  is  not  a  compromise  between  them.  It  goes  beyond  either  of 
these  types  of  studies  and  encompasses  a  number  of  very  interesting  and  impor¬ 
tant  problems.  The  motivation  for  this  search  for  individual  differences  in 
psychological  phenomena  is  not  just  to  merge  the  two  disciplines  but  is  to 
solve  problems  not  encompassed  by  either  discipline. 

A  most  interesting  possibility  in  the  structure  of  individual  difference*'; 
of  within-individual  relations  between  variables  is  that  there  exist  clusters 
of  individuals  such  that  the  within-individual  relations  are  the  same  for  all 
individuals  in  each  such  cluster  and  differ  from  cluster  to  cluster.  If  such 
clustering  of  individuals  is  the  case,  even  within  a  reasonable  approximation 
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to  the  actual  structure  of  individual  differences,  the  study  of  with: n- indiv¬ 
idual  relations  and  the  application  of  knowledge  gained  can  be  increased  con 
siderably  in  precision.  Theories  of  learning,  for  example,  could  be  construc¬ 
ted  such  that  special  cases  would  be  applicable  for  each  cluster  of  individ¬ 
uals.  These  special  case  learning  theories  would  fit  the  learning  behavior 
of  indiviuuals  setter  than  a  learning  theory  that  ignored  individual  differ¬ 
ences.  If  the  e  are  cluster  of  individuals  in  the  relations  between  abnorma 
psychological  behavior  and  tieatment,  then  the  description  of  the  effects  of 
treatments  could  be  increased  in  precision.  Further,  being  able  to  place  any 
mental  patient  within  a  cluster  of  individuals  would  aid  materially  with  sel¬ 
ection  of  treatments  to  lead  to  desirable  behavior  changes.  It  might  be  the 
case  that  seemingly  conflicting  theories  of  personality  refer  to  different 
clusters  of  individuals  in  the  dynamics  of  personality  behavior  and  are  spec¬ 
ial  cases  of  a  more  general,  but  flexible,  theory  of  personality  which  takes 
on  different  forms  for  the  several  clusters  of  individuals. 

Before  discussing  cases  of  individual  differences  in  within-individual 
relations  some  consideration  will  be  given  to  work  on  personalizing  regression 
e:/. imation  of  criteria  variables  from  selected  predictor  variables.  Ghiselli 
(1956,  1960a)  reported  on  work  on  the  prediction  of  predictability  in  which  he 
developed  tests  to  predict  the  absolute  values  of  errors  of  estimate  in  the 
regression  of  a  criterion  variable  on  a  predictor  variable.  In  terms  of  the 
errors  of  estimate,  he  could  place  individuals  in  two  categories:  one  with  low 
absolute  errors  of  estimate  and  the  other  with  high  errors  of  estimate.  By 
constructing  a  new  measure  using  item  analysis  procedures  he  was  able  to  appro¬ 
ximate  the  placement  of  individuals  in  these  classes.  This  constitutes  a  sim¬ 
ple  case  of  categorizing  individuals  as  to  the  relation  between  a  criterion 
variable  and  a  predictor  variable. 

In  a  second  activity,  Ghiselli  (1960b)  worked  on  the  differentiation  be¬ 
tween  tests  as  to  the  accuracy  with  which  they  predict  a  criterion  for  a  given 
individual.  In  this  case  two  predictor  variables  were  considered  separately 
and  errors  of  estimation  were  obtained  for  each  predictor  in  a  regression  with 
the  criterion  variable.  The  absolute  values  of  these  errors  of  prediction 
were  used  and  categories  were  established  according  to  which  error  of  predic¬ 
tion  was  larger  in  absolute  value.  Again,  a  new  measure  was  constructed  by 
item  analysis  procedures  to  a; proximate  the  placement  of  individuals  in  the 
categories.  This  is  a  most  interesting  possibility  for  the  categorization  of 
individuals  according  to  the  relations  between  variables.  A  point  to  note  is 
that  the  individuals  do  not  form  homogeneous  groups  as  to  either  the  predictor 
variables  or  the  criterion  variables.  The  categories  are  related  to  the  re¬ 
lations  between  the  criterion  variable  and  the  predictor  variables. 

The  work  by  Ghiselli  is  related  to  the  study  by  Frederiksen  and  Melville 
(1954)  on  differential  predictability  of  test  scores  and  to  Saunders'  work 
(1955,  1956)  on  moderator  variables.  More  recently,  Cleary  (1966)  has  pro¬ 
posed  a  technique  for  investigation  of  the  possibility  of  developing  systems 
of  individualized  regression  weights  in  estimation  of  a  battery  of  criteria 
from  a  battery  of  predictor  variables.  Given  scores  Xpj  of  persons  p  =  1, 

2,  3,  ...»  P  on  predictor  variables  j  =  1,  2,  3,  ...,  J  and  yp^  of  the 

persons  on  criterion  variables  k  =  1,  2,  3,  ...,  K  the  personalized  linea.? 
regression  equation  can  be  written  as 


1) 


ypk  “  |  WpjkXpj  +  *pk 
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where  wp^  are  the  parsonalized  regression  weights  and  e  ^  are  errors 
of  estimate.  The  personalized  regression  weights  are  defined  by 
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L  pm  mjk 
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for  dimensions  m  -  1,  2,  3, 


b  are  coefficients  for  the  persons  and 
pm  r 

ations  of  predictors  j  and  criteria  k  . 


M  of  a  regression  weight  space  and  where 


mjk 


are  coefficients  for  combin- 


This  system  degenerates  to  the 


usual  regression  system  when  there  is  one  dimension  and  all  b  are  unity. 

pm 

Otherwise,  this  system  provides  for  individual  differences  in  the  regression 
weights  within  the  limits  of  the  number  of  dimensions  utilized.  To  obtain 
non-trivial  solutions  the  number  of  dimensions  must  be  fewer  than  the  number 
of  criterion  variables.  Note  that  determination  of  the  coefficients  in  this 
system  depends  only  on  knowledge  of  the  predictor  and  criterion  variable  sco¬ 
res.  In  an  experimental  application  of  this  system  to  a  case  involving  five 
criteria,  two  batteries  of  five  predictors,  and  two  samples  of  individuals 
Cleary  found  that  the  use  of  two  dimensions  in  the  regression  weight  space 
for  each  battery  of  predictors  markedly  reduced  the  sum  of  squared  errors  of 
estimate  and  that  the  am^  coefficients  were  very  stable  when  determined 

separately  from  the  two  samples.  The  person  coefficients  b  had  one  stable 

pm 

dimension  when  determined  for  each  sample  separately  from  the  two  batteries  of 
predictors.  While  the  person  coefficients  b^m  are  determined  in  this  model 

from  knowledge  of  both  the  predictor  and  criterion  scores,  which  makes  use  of 
the  model  questionable  in  applied  situations,  approximations  to  these  coeffic¬ 
ients  may  be  obtained  from  measures  developed  by  test  construction  and  item 
analysis  techniques.  A  very  interesting  possibility  is  that  the  individuals 
might  be  distributed  in  a  number  of  clusters  according  to  the  values  of  their 
coefficients  b 


pm 


If  this  nere  the  case,  categories  might  be  established 


such  that  different  regression  systems  were  appropriate  for  the  different  cat¬ 
egories.  Such  categorization  would  be  extremely  important  in  that  it  would 
indicate  the  existance  of  sub-populations  of  people  for  which  different  laws 
of  relations  existed  between  measures  of  behavior.  Knowledge  of  these  differ¬ 
ences  in  laws  of  relation  would  add  materially  to  our  knowledge  of  psycholog¬ 
ical  phenomena. 


Extensive  and  critical  studies  should  be  conducted  as  to  the  possibility 
of  the  clustering  of  individuals  as  to  within-individual  relations  between  var¬ 
iables.  For  an  example  of  such  studies  consider  the  area  of  color  perception. 
Illustrative  data  for  such  a  study  is  given  in  Table  1.  These  data  are  ficti¬ 
tious,  being  constructed  to  present  a  simpler  version  of  results  obtained  by 
Helm  and  Tucker  (1962);  these  data,  however,  represent  fairly  faithfully  some 
of  the  major  aspects  of  the  results  obtained  by  Helm  and  Tucker  from  real  data. 
In  the  real  data.  Helm  obtained  measures  of  judged  interpoint  distances  between 
stimulus  objects  for  each  pair  of  such  objects  using  the  method  of  triad-ntio 
judgments.  These  measures  were  obtained  separately  for  each  subject.  The  sti¬ 
mulus  objects  used  by  Helm  were  ten  hexagonal  tiles,  2  inches  across,  each 
painted  with  a  different  color,  such  that  the  ten  colors  were  of  the  same  lig 
ness  and  saturation  and  formed  an  equally  spaced  circle  of  hues.  The  data  in 
Table  1  are  interpoint  distances  for  pairs  of  eight  color  stimuli  and  si.!  in¬ 
dividual  subjects.  A  major  attribute  of  the  fictitious  data  in  Table  1  is  that 
it  is  constructed  so  that  the  model  for  studying  individual  differences  in 
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multidimensi.  .ial  scaling  (Tucker  and  Messick,  1963)  fits  perfectly.  One  of  the 
technical  problems  to  be  discussed  subsequently  in  this  paper  concerns  the  fit¬ 
ting  of  the  model  to  real  data  involving  discrepancies  of  the  model  from  the 
observed  data.  Individuals  1,  2,  and  3  have  normal  color  vision  while  indiv- 
iduals  4,  5,  and  6  have  progressively  weaker  color  vision.  This  distribution 
of  subjects  as  to  color  vision  has  relatively  too  few  individuals  with  normal 
color  vision  but  seems  to  represent  fairly  well  the  progression  of  color  werk 
subjects  as  appearing  in  the  Helm  and  Tucker  results.  There  is  an  unresolved 
question  as  to  the  distribution  of  relative  extents  of  weakness  in  colo"  vision 
in  the  population. 

Let  us  compare  the  color  judgment  data  of  the  Helm  and  Tucker  study  with 
the  frur  attributes  of  studies  discussed  earlier.  A  series  of  behaviors  ex¬ 
ists  for  each  subject:  the  judgments  of  relative  differences  of  pairs  of  stim¬ 
uli  in  triads  of  stimuli.  Two  measures  are  obtained  for  each  behavior:  the 
judged  ratios  of  relative  differences  between  stimuli  in  the  two  less  differ¬ 
ent  pairs  of  stimuli  to  the  relative  differences  between  stimuli  in  the  most 
different  pair  in  a  triad  of  stimuli.  These  data  have  been  analysed  for  each 
subject  to  relative  interpoint  distances  between  stimuli  in  each  pair  of  stim¬ 
uli  from  the  set  of  stimuli  used  in  the  study.  These  relative  interpoint  dis¬ 
tances  could  be  analysed  for  each  subject  to  uncover  a  multidimensional  scaling 
of  the  perceptual  space  for  that  subject,  a  step  that  was  performed  by  Helm. 
These  multidimensional  scalings  constitute  within-individual  relations  among 
the  measures.  Analysis  of  tl.e  matrix  of  relative  interpoint  distances,  such 
as  in  Table  1,  by  the  Tucker  and  Messick  model  for  individual  differences  in 
multidimensional  scaling  is  an  investigation  of  the  structure  of  individual 
differences  in  these  within-individual  relations.  Thus,  this  study  illustrates 
the  profile  of  attribute  values  for  the  search  for  individual  differences  in 
psychological  phenomena. 

Analysis  of  the  individual  interpoint  distances  for  the  structure  of  the 
individual  differences  takes  these  interpoint  distances  as  input  data  and  forms 
a  matrix,  which  is  designated  X  and  is  illustrated  in  Table  1,  with  a  row 
for  each  pair  of  stimuli  and  a  column  for  each  individual.  This  analysis  pro¬ 
ceeds  to  determine  what  is  called  here  the  characteristic  components  of  the 
matrix  X  by  a  technique  based  on  the  theorem  by  Carl  Eckart  and  Gale  Young 
(1936)  on  the  approximation  of  a  matrix  by  another  of  lower  rank.  This  tech¬ 
nique  is  closely  related  to  the  method  of  principal  components  proposed  by  Har¬ 
old  Hotelling  (1933).  For  the  sake  of  clarity,  discussion  of  several  technicci. 
points  is  being  postponed  to  following  presentation  of  the  analysis  techniqu' 
as  applied  to  the  data  in  Table  1.  Steps  in  the  analysis  are  outlined  below. 

A.  Compute  the  matrix  product  X'X  which  contains  the  sums  of  squares 
of  entries  in  the  columns  of  X  as  diagonal  entries  and  sums  of  products  oc 
pairs  of  entries  in  each  pair  of  columns  as  off-diagonal  entries. 

B.  Determine  a  scaling  constant  k2  by  dividing  the  number  of  individ¬ 
uals  by  the  sum  of  the  diagonal  entries  in  X'X  (this  sum  equals  the  sum  of 
squares  of  the  entries  in  X  )  . 

C.  Multiply  X'X  by  the  scaling  constant  k2  . 

D.  Determine  the  characteristic  roots  and  unit  length  vectors  of  k2X'X  . 
Let  the  roots  be  designated  by  y^Z  and  be  arranged  in  descending  order,  and 

let  the  corresponding  vectors  be  designated  by  .  The  roots  for  the  ill  ,~- 

trative  example  are  listed  at  the  left  of  Table  2. 

E.  Select  the  r  largest  roots  (a  point  to  be  discussed)  and  form  the 

matrix  Z  containing  as  row  vectors  y  V  for  the  selected  roots.  The  matrix 

mm 

Z  ,  in  transposed  form,  for  the  example  is  given  in  the  middle  of  Table  2.  The 


entries  in  this  matrix  are  t.  a  scores  of  the  individuals  on  the  selected  char¬ 
acteristic  components. 

F.  Determine  the  matrix  B  of  loadings  of  stimulus  pairs  on  character¬ 
istic  components  by 

3)  B  =  XZ'UZT1  . 


Each  column  vector,  B  may  be  determined  by 

m 


4)  B  =  XZ'y”2 

m  mm 

=  XV  y"1 
m  m 

where  Z  is  the  m'th 
m 

row  vector  of 

Z  and  V 

m 

is 

the  m'th  characteris 

tic  vector  written  as  a 

column  vector. 

The  matrix 

B 

for  the  example  is  at 

the  right  of  Table  2. 

G.  Construct  an  r  dimensional  space  corresponding  to  the  matrix  Z  - i 
which  each  individual,  i  ,  is  represented  by  a  point  having  coordinates  z  ^ 

on  the  m  orthogonal,  coordinate  axes.  For  the  example,  this  space  is  two 
dimensional  a^.d  is  presented  in  Figure  1  with  a  solid  point  for  each  of  the 
individuals.  The  configuration  of  points  in  this  space  represents  the  struc¬ 
ture  of  the  individual  characteristics  underlying  the  psychological  phenomenon 
of  color  vision  as  this  phenomenon  is  reflected  in  the  judgments  made  for  the 
selected  group  of  stimuli. 

H.  Inspect  the  configuration  of  points  in  the  space  constructed  in  step 
G  for  interesting  characteristics  such  as  clusters  of  individuals.  In  the  ex¬ 
ample,  the  three  normal  subjects  are  colinear  from  the  origin  and  can  be  tho’-gh 
of  as  constituting  a  cluster.  The  three  individuals  having  weakness  in  coloi 
perception  do  not  form  a  cluster  but  have  points  located  at  varying  distances 
from  the  cluster  of  points  for  indiviuuals  having  normal  color  vision.  These 
distances  correspond  to  the  extent  of  deficiency  in  color  vision  of  these  in¬ 
dividuals.  Results  of  this  inspection  may  be  described,  in  part,  by  selecticr- 
of  points  in  the  space  that  may  be  considered  as  conceptual,  or  idealized  in¬ 
dividuals  which  represent  interesting  locations  in  the  configuration  of  points 
for  the  actual  individuals.  In  the  example,  two  idealized  individuals  were 
selected:  one  to  represent  the  individual  having  normal  color  vision  and  the 
other  beyond  the  most  severely  color-weak  observed  individual  so  as  to,  pos¬ 
sibly,  represent  an  individual  who  is  totally  color  blind.  These  are  ideal¬ 
ized  individuals  A  and  B  and  are  indicated  in  Figure  1  by  open  circles. 

I.  Construct  a  matrix  Zw  of  scores  of  idealized  individuals  on  char¬ 
acteristic  components  containing  the  coordinates  of  the  points  for  the  selec¬ 
ted  idealized  individuals.  This  matrix  has  a  column  for  each  idealized  1:  -'*i- 
vi dual  and  a  row  for  each  characteristic  component.  For  the  example,  the  mat¬ 
rix  Z‘  ,  in  transpose  form,  is  given  at  the  left  of  Table  3. 

J.  Determine  the  matrix  X*  of  interpoint  distances  between  pairs  of 
stimuli  for  the  idealized  individuals  by 

5)  X*  =  BZft  . 

This  matrix  for  the  example  is  given  on  the  right  of  Table  3. 

K.  Using  the  interpoint  distances  in  each  column  of  X*  ,  separately 
by  column,  perform  a  multidimensional  scaling  to  obtain  the  perceptual  spree 
for  each  idealized  individual.  These  two  spaces  resulting  from  the  multidi¬ 
mensional  scaling  for  the  two  idealized  individuals  in  the  example  are  given 
in  Figure  2.  The  space  for  idealized  individual  A  which  represents  normal 
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color  vision  is  two  dimensional  and  has  a  circular  configuration  of  joints  for 
the  selected  color  stimuli.  This  is  the  expected  result.  The  perceptual  spn.-'* 
for  idealized  individual  B  is  also  two  dimensional,  but  has  the  stimuli  in  a 
semi-circle  such  that  would  be  produced  by  folding  the  circle  for  normal  color 
vision  on  an  axis  from  Y  to  B.  This  was  an  unexpected  result  found  by  Helm 
in  his  multidimensional  scaling  of  individual  interpoint  distances  and  which 
appeared  in  the  analysis  by  Helm  and  Tucker.  The  expectation  was  that  the  lor.p 
of  color  vision  would  result  In  a  one  dimensional  perceptual  space.  This  ap¬ 
pears  not  to  be  the  case.  Th?se  results  raise  several  interesting  conjectures 
which  could  be  investigated  experimentally  as  to  the  perception  of  color  by 
color-blind  individuals.  However,  discussion  of  these  conjectures  here  would 
take  us  too  far  afield  from  the  major  theme  of  this  paper. 

The  format  of  analysis  outlined  in  the  preceding  paragraphs  is  of  quite 
wide  applicability  for  a  variety  of  types  of  data.  One  important  requirement 
is  that  the  data  for  each  subject  be  measures  of  a  single  dependent  variable 
for  various  values  of  independent  variables.  In  the  preceding  example,  the 
independent  variable  was  the  set  of  colored  stimulus  pairs  formed  by  the  Car¬ 
tesian  product  of  the  set  of  colored  stimuli  with  itself,  excluding  identical 
pairs.  The  dependent  variable  was  the  judged  interpoint  distances.  In  the 
generalized  format,  the  observations  of  the  dependent  variable  for  each  indiv¬ 
idual  would  be  recorded  in  a  column  or  the  matrix  X  .  Each  row  of  X  would 
be  for  some  particular  values  of  the  independent  variables.  Steps  A  through 
J  would  be  conducted  as  described  while  step  K  would  be  altered  to  a  form  ap¬ 
propriate  to  study  of  the  relation  of  the  dependent  variable  to  the  indepen¬ 
dent  variables.  For  another  example  consider  a  study  of  the  learning  of  some 
task.  The  independent  variable  would  be  the  series  of  trials  or  learning  per 
iod«.  The  dependent  variable  would  be  a  measure  of  the  performance  of  a  sub¬ 
ject  on  each  trial.  There  would  be  a  row  of  matrix  X  for  each  trial  and  a 
column  for  each  individual  with  entries  being  the  measures  of  performance.  The 
matrix  X*  obtained  in  step  J  would  contain  measures  of  performance  for  ideal¬ 
ized  individuals  on  the  trials  so  that  the  series  of  entries  in  each  column  cf 
X*  could  be  used  to  develop  a  learning  curve  for  the  corresponding  idealized 
individual.  Another  examle  could  be  the  study  of  preferences  among  pairs  of 
stimuli  for  which  the  dependent  variable  was  ratings  of  relative  preference. 
Results  could  yield  a  preference  scale  for  each  idealized  individual.  Still 
another  example  could  involve  semantic  differential  ratings  of  concepts  on  bi¬ 
polar  adjective  scales.  Each  row  of  matrix  X  would  be  for  a  pairing  of  a 
concept  with  a  bipolar  scale.  Step  K  would  involve  the  determination  of  the 
connotativc  semantic  space  for  each  idealized  individual. 

There  are  several  technical  matters  involved  in  the  analysis  which  warrant 
consideration  in  this  report.  The  measures  of  the  dependent  variable  should 
be  such  as  to  support  a  study  of  the  relation  of  the  dependent  variable  to  the 
independent  variables  for  each  individual.  Further,  these  measurements  for  the 
dependent  variable  should  be  interpretable  as  on  either  an  interval  scale  or  a 
ratio  scale  for  each  individual.  In  case  the  measures  are  on  an  interval  scale 
with  a  meaningless  origin,  the  origin  for  each  individual  should  be  set  at  thn 
mean  for  the  individual  so  that  deviations  from  the  mean  for  the  individual  are 
used  in  the  analysis.  This  step  converts  the  interval  scale  measurements  to  r, 
type  of  ratio  scale  measurements  of  discrepancies  from  the  individual  mean. 
Necessity  for  a  ratio  class  of  measurement  lies  in  the  model  underlying  tie  an¬ 
alysis  of  steps  A  through  J. 

Characteristic  component  analysis  as  used  here  is  related  to  factor  anal¬ 
ysis,  especially  to  obverse  factor  analysis  for  which  factors  among  people  nr  2 
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determined.  There  are,  however,  several  important  distinct  features.  Consi¬ 
der  the  statistical  model  for  regular  factor  analysis  as  given  in  equations 
(6)  and  (C). 
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For  each  individual  sampled  from  some  population  there  is  a  random  variable 

vector  of  dimensionality  (r  ♦  n)  where  there  are  r  common  factors 

and  n  unique  factors.  Entries  in  this  vector  are  the  factor  scores  for  the 
individual.  The  factor  matrix  (A  |  U)  is  a  transformation  on  the  factor 
score  vector  to  yield  the  random  variable  vectoi  jc  of  observed  scores  on  tl  * 

11  , 

n  variables.  In  the  population,  the  random  variable  vector  t — — ;  of  factor 
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scores  has  a  density  function  with  mean  vector  i -  and  a  covariance  matr:: 


U) 


/ilil 

\0 


The  density  function  for  the  random  variable  x  of  observed  score: 


has  a  mean  vector  of  ^  and  covariance  matrix  l  .  The  relation  of  the  mean 
vector  for  observed  scores  to  the  mean  vector  for  factor  scores  is  given  in 
equation  (7) 
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u  8 


(A  |  U) 


and  the  relation  of  the  covariance  matrix  of  the  observed  scores  to  the  cova: 
iance  matrix  of  factor  scores  is  given  in  equation  (8) 
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From  a  random  sample  of  individuals,  the  vector  of  sample  means  x  is  an  est¬ 
imate  of  the  vector  y  ;  and  the  covariance  matrix  S  is  an  estimate  of  £  . 
Factor  analytic  methods  produce  estimates  A  and  0  of  the  population  trans¬ 
formation  parameters  A  and  U  .  A  second  set  of  methods  are  used  of  obtain 
estimates  of  the  factor  scores;  these  methods  include  such  procedures  as  re¬ 
gression  estimates  and  Bartlett's  (1936)  procedure  to  minimize  the  uniquenesses. 
Guttman  (1955)  has  pointed  out  the  insolubility  of  the  factor  score  problem. 

Obverse  factor  analysis  interchanges  the  role  of  the  individual  and  the 
attribute  measured  in  the  mod.)l  for  factor  analysis.  Any  attribute  measured 
is  considered  as  sampled  from  a  population  of  attributes  and  is  associated 
with  random  variable  vectors  having  entries  for  the  individuals.  In  this  case 
the  individuals  are  taken  as  fixed  so  that  the  factor  scores  of  the  individ¬ 
uals  are  parameters  of  the  model  and  are  estimated  by  the  analysis.  For  this 
case,  the  loadings  of  the  variables  are  inaccessible  in  the  same  sense  as  the 
factor  scores  were  inaccessible. 

Since  the  search  for  individual  differences  in  psychological  phenomena 
requires  both  the  loadings  of  the  attributes  measured  and  the  factor  scores, 
neither  the  direct  factor  analysis  model  nor  the  obverse  model  is  appropriate. 

A  third  model  is  employed. 


The  two  major  aspects  of  the  factor  analytic  model  for  the  present  dis¬ 
cussion  are  the  assumption  of  unique  factors  and  the  assumption  that  the  in¬ 
dividuals  or  attributes  measured  are  sample . elements  from  a  population.  In 
contrast,  the  model  for  characteristic  component  analysis  does  not  include 
unique  factors  and  assumes  that  the  individuals  and  attributes  measured  are 
the  equivalent  of  fixed  effects.  The  model  for  this  analysis  is  given  in 
equation  (9) 
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where  c  is  a  random  variable  with  mean  *  0  and  standard  deviation  ~  o 


ji 


and  is  independent  for  each  ji  combination.  Exclusion  of  the  unique  factor 
aspect  of  the  factor  analvtic  model  implies  that  the  group  of  variables,  j  , 
cover  the  domain  so  that  particular  variables  are  not  dependent  on  specific 
influences.  Postulation  of  individuals  as  fixed  effects  is  necessary  to  en¬ 
able  estimation  of  both  the  ^  's  and  the  z  .'s  as  parameters  of  the  mc^l. 
Such  estimates  are  needed  for  tKe  complete  procedure  involving  the  individual 
space,  selection  of  idealized  individuals,  and  estimation  of  observations  for 
these  idealized  individuals.  The  procedure  described  provides  a  least  squares 
fitting  of  the  model  to  the  data.  Also,  in  case  the  are  constant  fo.  all 

ji  combinations,  the  procedure  is  a  maximum  likelihood  estimation  as  indicated 

where  the  a.,  are  known  for  the 


by  Young  (1941).  In  care  the  =  caj^i 
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variables  and  8^  are  known  for  the  individuals,  a  slightly  more  complex  pro¬ 
cedure  should  be  used.  As  noted  by  Anderson  and  Rubin  (1956),  it  is  necessary 
that  the  and  8^  be  known. 


Determination  of  the  number  r  of  dimensions  to  be  used  in  the  analysis 
for  any  particular  body  of  data  is  an  unsolved  problem  quite  analogous  to  the 
number  of  factors  problem.  Several  properties  of  the  series  of  roots  y*  nay 

be  noted  in  this  context  as  yielding  some  guides  to  the  selection  of  the  number 
of  dimensions  to  be  used.  First,  all  roots  arc  non-negative.  Second,  the  .;u>- 
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of  squares  of  the  errors  of  proximation  of  the  data  from  the  model  for  any 
'’iven  number  of  dimensions  chosen  is  given  by  the  sum  of  the  remaining  roc.  s. 
THrd,  the  su-  of  squares  of  the  approximations  to  the  data  is  given  by  the 
sum  of  the  roots  for  the  dimensions  used  in  the  approximation.  Thus,  one  nor- 
sibility  is  to  use  as  many  dimensions  as  necessary  to  obtain  some  desired  'e- 
rree  of  goodness  of  fit  of  the  model  to  the  data.  Cumulative  sums  of  tha  roots 
will  aid  in  determining  the  number  of  dimensions  necessary.  A  second  possi¬ 
bility  is  to  inspect  the  series  of  roots  for  some  break  in  the  relation  between 
root  size  and  root  number.  For  this  criterion,  one  postulates  that  the  indiv¬ 
idual  space  would  be  of  some  small  dimensionality  except  for  the  discrepancies 
involved  in  making  the  observations.  Then,  there  should  be  two  laws  of  form¬ 
ation  for  the  series  of  roots,  one  for  the  dimensions  relevant  to  the  individ¬ 
ual  space  and  a  second  for  the  discrepancies.  If  this  be  the  case,  there  should 
be  a  break  in  the  relation  between  root  size  and  roct  number.  Such  changes  in 
form  of  relation  have  been  observed.  Further,  for  some  bodies  of  data,  ther^ 
appeared  to  be  a  linear  relation  between  root  size  and  root  number  lor  a  lar-e 
number  of  roots  beyond  a  small  number  of  initial  roots  which  were  larjer  than 
would  be  expected  from  this  linear  relation.  At  present,  this  inspection  of 
the  series  of  roots  seems  to  be  the  best  procedure  available. 

A  single  dependent  variable  has  been  involved  in  the  discussion  to  this 
point;  however,  the  extension  of  factor  analysis  to  a  model  for  three  mode  fac¬ 
tor  analysis  permits  the  investigation  of  cases  when  measures  are  made  on  sev¬ 
eral  dependent  variables  for  each  pattern  of  values  of  the  independent  vari¬ 
ables.  For  an  example  of  this  class  of  data  consider  the  complex  tracking 
task  investigated  by  Parker  and  Fleishman  (1960).  The  subject  was  to  control 
a  dot  on  a  cathode  ray  oscillograph  using  a  control  stick  and  rudder  pedals 
as  in  a  stand? vd  aircraft  cc  trol  system.  Movement  of  the  dot  was  introduced 
electronically  and  the  subject's  task  was  to  keep  the  dot  centered  on  the  os¬ 
cillograph  as  well  as  to  avoiu  sideslip  which  was  indicated  by  a  separate  met'r. 
Measures  were  obtained  of  horizontal  error,  vertical  error,  sideslip  error, 
and  time-on-target  for  each  of  a  number  of  stages  of  practice.  The  study  by 
Parker  and  Fleishman  involved  203  individuals,  10  stages  of  practice,  and  ".he 
four  dependent  variables  listed  above.  These  may  be  interpreted  as  the  three 
modes  for  identification  of  the  data  as  defined  by  Tucker  (1964,  1966).  The 
model  for  three  mode  factor  analysis  that  would  be  appropriate  in  the  present 
context  would  be  an  extension  from  equation  (9)  (present  use  of  letters  is  not 
to  be  confused  with  previous  use): 
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The  observed  data  are  denoted  bv  x 


ijk 


which  are  entries  in  a  three  mode  r* 


rix  X  with  rows  for  individuals,  columns  for  stagts  of  practice,  and  sir.-*--, 
for  dependent  variable.  Parameters  of  the  model  are  contained  in  the  two  me  e 
matrices  A  ,  B  ,  and  C  plus  the  three  mode  matrix  G  .  The  matrix  A 
has  rows  for  observed  individuals  and  columns  for  idealized  individuals;  mat¬ 
rix  B  has  rows  for  observed  stages  of  practice  and  columns  for  idealize  ’ 
stages  of  practice;  matrix  C  has  rows  for  observed  dependent  variables  anu 
columns  for  idealized  dependent  variables.  Matrix  G  is  called  the  core  mat¬ 
rix  and  has  measures  of  the  idealized  dependent  variables  at  the  idealize 
stages  of  practice  for  the  idealized  individuals.  Again,  e  is  a  random  var¬ 
iable  with  mean  =  0  and  ‘an  ard  deviation  =  o. ..  .  Analysis  methods  are  oon- 
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siderable  extensions  of  the  type  outlined  earlier.  The  major  aspect  of  this 
analysis  especially  relevant  for  this  conference  involves  the  study  of  th<. 
matrix  A  for  clusters  of  individuals.  If  such  clusters  exist,  then  the  re¬ 
lations  of  the  dependent  variables  on  the  independent  variables  would  be  the 
same  for  individuals  in  each  cluster  and  different  for  individuals  from  dif¬ 
ferent  clusters. 

Major  emphasis  has  been  placed  on  the  categorization  of  individuals  as 
to  the  relations  between  variables.  To  attempt  to  establish  categories  ac¬ 
cording  to  the  values  of  measures  such  that  all  individuals  within  a  category 
can  be  considered  as  replicates  and  for  which  there  would  be  the  same  expec¬ 
tation  as  to  other  measures  seems  to  be  an  unrealistic  and  hopeless  endeavor. 
If  categories  can  be  established  as  to  the  relations  between  measures,  the  in¬ 
dividuals  within  a  category  could  differ  while  the  same  dynamic  laws  applied. 
These  categories  would  be  especially  interesting  and  relevant  in  scientific 
knowledge  as  well  as  in  application  to  many  problems. 
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Table  1 

Matrix  X  of  Individual  Interpoint  Distances 
Fictitious  Color  Judgment  Data* 


Stimulus 

Pairs 

1 

2 

Individuals 
3  4 

5 

6 

a-b 

8 

7 

8 

8 

10 

8 

a-c 

14  i 

13 

16 

16 

18 

16 

a-d 

18 

17 

20 

18 

19 

11 

a-e 

20 

18 

22 

18 

16 

4 

a-f 

18 

17 

20 

18 

19 

11 

«-g 

14 

13 

16 

16 

18 

16 

a-h 

8 

7 

8 

8 

10 

8 

b-c 

8 

7 

8 

8 

10 

8 

b-d 

14 

13 

16 

13 

11 

3 

b-e 

18 

17 

20 

18 

19 

11 

b-f 

20 

18 

22 

21 

23 

17 

b-g 

18 

17 

20 

20 

24 

20 

b-h 

14 

13 

16 

16 

18 

16 

c-d 

8 

7 

8 

8 

10 

8 

c-e 

14 

13 

16 

16 

18 

16 

c-f 

18 

17 

20 

20 

24 

20 

c-g 

20 

18 

22 

22 

26 

22 

c-h 

18 

17 

20 

20 

24 

20 

d-e 

8 

7 

8 

8 

10 

8 

d-f 

14 

13 

16 

16 

18 

16 

d-g 

18 

17 

20 

20 

24 

20 

d-h 

20 

18 

22 

21 

23 

17 

e-f 

8 

7 

8 

8 

10 

8 

e-g 

14 

13 

16 

16 

18 

16 

e-h 

18 

17 

20 

18 

19 

11 

f-i 

e 

7 

8 

8 

10 

8 

f-h 

14 

13 

16 

13 

11 

3 

E-h 

8 

7 

8 

8 

10 

8 

Stimulus  colors:  a  -  Red 

e  -  Green 

b  -  Orange 

f  -  Blue -Green 

c  -  Yellow 

g  -  Blue 

d  -  Yellow-Green 

h  -  Purple 

*  Designed  in  accord  with  results  reported  by  Helm 
and  Tucker  (1962). 
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Table  2 

Characteristic  Components  Analysis  of  Matrix  X 


Characteristic 

Scores  of 

Individuals 

on 

Roots  of 

Characteristic  Components 

(k2X»X)* 

Individual 

I 

II 

I 

5.915 

1 

.97 

-.09 

II 

.085 

2 

• 

CD 

CD 

CD 

O 

• 

1 

III 

0.000 

3 

1.07 

-.10 

IV 

0.000 

4 

1.02 

CM 

o 

• 

• 

V 

0.000 

5 

1.14 

.07 

VI 

0.000 

6 

.85 

.24 

Loadings  of  Stimulus  Pairs  on 
Characteristic  Components 


Stimulus  Pair 

a-b 
a-c 
a-d 
a-e 
a-f 
a-g 
a-h 

b-c 
b-d 
b-e 
b-f 
b-g 
b-h 

c-d 
c-e 
c-f 
c-g 
c-h 

d-e 
d-f 
d-g 
d-h 

e-f 
e-g 
e-h 

f-g 
f-h 

g-h 


I 

II 

8.4 

5.6 

15.5 

10 .  ■') 

17.4 

-17.6 

16.7 

-42.7 

17.4 

-17.6 

15.5 

10.3 

8.4 

5.6 

8.4 

5.6 

11.8 

-30.2 

17.4 

-17.6 

20.4 

-  2.2 

20.2 

13.5 

15.5 

10.3 

8.4 

5.6 

15.5 

10.3 

20.2 

13.5 

21.9 

14.6 

20.2 

13.5 

8.4 

5 .  o 

15.5 

1),  3 

20.2 

13.6 

20.4 

-  2.2 

8.4 

5.6 

15.5 

10.3 

17.4 

-17.6 

8.4 

5.i, 

11.8 

-30.2 

8.4 

5.6 

*  k2  =  N  /  Trace  (X'X) 
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Interpoint  Distances  for  Idealized  Individuals 


Scores  of  Idealized 

Individuals  on 

Interpoint 

Distances 

for 

Characteristic 

Components 

Idealized 

Individuals 

Idealized 

Stimulus 

Individual  I 

II 

Pairs 

A 

B 

A  .97 

-.09 

a-b 

8 

8 

B  .73 

.28 

a-c 

14 

14 

a-d 

18 

9 

a-e 

20 

0 

a-f 

18 

8 

a-g 

14 

14 

a-h 

8 

8 

b-c 

8 

8 

b-d 

14 

0 

b-e 

18 

8 

b-f 

20 

14 

b-S 

18 

18 

b-h 

14 

14 

c-d 

8 

8 

c-e 

14 

14 

c-f 

18 

18 

c-g 

20 

20 

c-h 

18 

18 

d-e 

8 

8 

d-f 

14 

14 

d-g 

18 

18 

d-h 

20 

14 

e-f 

8 

8 

e-g 

14 

14 

e-h 

18 

8 

f-E 

8 

8 

f-h 

14 

0 

g-h 

8 

8 

10.17 


Figure  1 

Cf.ai-acCcristi  c  Component  Space  for  Individuals 


legalised  Individual  A 


Idealized  Individual  3 
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Figure  2 

Color  Fcrccptual  Spaces  for  Idealized  Individuals 
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THE  MAXOF  CLUSTERING  MODEL1 
Raymond  E.  Christal 

Air  Force  Personnel  Research  Laboratory 

and 

Joe  H.  Ward,  Jr. 

Southwest  Educational  Development  Laboratory 


This  paper  describes  a  highly  flexible  technique  for  clustering 
people  or  things  into  categories.  If  a  complete  hierarchical  structure 
is  desired,  such  as  the  establishment  of  a  biological  taxonomy,  then 
the  MAXOF  Clustering  Model  will  yield  an  optimum  solution  in  terms  of  a 
criterion  established  by  the  investigator.  If  the  desire  is  to  cluster 
a  large  number  of  people  or  things  into  mutually  exclusive  categories, 
then  an  optimum  solution  in  the  strictest  mathematical  sense  is  not 
feasible — even  with  modern  high-speed  computers.  However,  the  MAXO^ 
Clustering  Model  yields  a  near-optimum  solution  which  has  passed  the  test 
of  customer  satisfaction. 

In  using  the  MAXOF  Model,  the  investigator  must  make  three  major 
decisions.  First,  he  must  define  a  way  of  expressing  the  similarity 
among  the  things  or  people  to  be  clustered.  The  model  makes  no  demands 
on  the  form  of  this  overlap  function.  It  can  be  correlation  coefficients, 
co-variances,  cross-training  times,  distance  functions  or  measures  of  the 
homogeneity  of  regression  equations.  Any  function  is  legitimate  which  can 
be  quantified,  and  which  serves  the  investigator's  purpose.  Second,  the 
investigator  must  define  an  objective  function  which  is  to  be  maximized 
during  the  clustering  process.  For  example,  the  investigator  may  wish  to 
maximize  the  average  intercorrelation  among  items  within  clusters — or  to 
minimize  the  average  squared  distance  (d^  )  between  items  within  clusters. 
Again,  there  is  no  restriction  on  the  form  of  the  objective  function,  except 
that  it  be  feasible  to  compute.  Third,  the  investigator  must  decide  on  the 
appropriate  number  of  clusters  to  report.  Problems  associated  with  this 
decision  will  be  discussed  in  detail. 

The  MAXOF  Clustering  Model  takes  its  name  from  the  concept  of  MAXimizing 
an  Objective  Function,  which  is  its  most  unique  and  useful  characteristic . 
The  model  was  first  described  by  Joe  !i.  Ward,  Jr.,  in  a  paper  published  in 
March  1961,  under  the  title  "Hierarchical  Grouping  to  Maximize  Pay-Off," 
(Ward,  196l).  R.  A.  Bottenberg  and  R.  E.  Christal  described  in  detail  a 
specific  application  of  the  model  in  a  paper  published  this  same  month 
entitled  "An  Iterative  Technique  for  Clustering  Criteria  Which  Retains 
Optimum  Predictive  Efficiency,"  (Bottenberg  &  Christal,  1961 ) .  Since  19bl, 


the  MAXOF  Clustering  Mode]  has  been  applied  to  many  operational  problems, 
with  gratifying  results.  The  major  purpose  of  this  paper  is  to  describe 
the  model  in  sufficient  detail  for  readers  to  determine  its  applicability 
to  their  own  problems  of  interest.  For  this  reason,  stress  will  be  given 
to  a  discussion  of  the  basic  concepts  underlying  the  model,  and  to  a 
description  of  previous  applications  of  the  model  to  actual  problems. 
Readers  interested  in  more  detail  may  obtain  copies  of  the  references  or 
write  directly  to  one  of  the  authors. 


GENESIS 

It  all  began  when  Hq  USAF  asked  for  development  of  an  improved  method 
for  grouping  Jobs  into  specialty  clusters.  Let's  expand  on  this  for  a 
moment.  The  basic  management  unit  in  the  Air  Force  is  the  Air  Force 
Specialty.  Every  Job  in  the  Air  Force  has  been  assigned  a  specialty  code 
number  by  a  local  manpower  officer.  Every  man  in  the  Air  Force  also  has 
been  assigned  a  specialty  code  number,  indicating  that  he  is  primarily 
trained  to  perform  Jobs  in  that  specialty. 

Enlisted  personnel  in  the  Air  Force  change  Jobs  on  an  average  of  once 
every  two  years,  and  can  be  moved  freely  from  any  Job  to  any  other  Job 
having  the  same  specialty  number.  When  an  airman  changes  Jobs,  a  major 
cost  to  the  Air  Force  is  the  amount  of  time  required  for  him  to  reach  the 
same  level  of  proficiency  in  hia  new  Job  as  he  had  attained  in  the  Job  from 
which  he  was  transferred.  If  Jobs  within  specialties  are  not  homogeneous, 
the  Air  Force  pays  in  two  ways.  First,  it  must  continually  support  a  large 
and  expensive  retraining  program;  and  second,  at  any  point  in  time,  large 
numbers  of  men  will  not  have  reached  proficiency  in  their  current  assignment. 

It  seems  clear  that  Jobs  should  be  grouped  into  specialties  in  a 
manner  which  minimizes  the  overall  cross-training  time  among  Jobs  within 
specialties. 

Suppose  we  had  the  cross-training  times  among  a  thousand  Air  Force 
Jobs.  How  would  we  go  about  clustering  them  into  specialties  so  as  to 
minimize  the  average  cross-training  time  among  Jobs  within  specialties? 

The  first  urge  is  to  transform  the  data  into  a  matrix  of  intercorrelations 
or  d^'s,  for  common  clustering  techniques  usually  require  one  of  these  data 
forms  as  input.  But  why  distort  reality?  If  the  goal  is  to  minimize  cross¬ 
training  times  among  Jobs  within  clusters,  then  our  input  matrix  should  be 
cross-training  time  values. 
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Having  settled  on  the  nature  of  the  input  matrix,  we  now  turn  our 
attention  to  the  problem  of  clustering  the  Jobs  into  specialties  so  as  to 
meet  our  objective. 

But  hold  on!  As  stated,  our  objective  is  met  before  any  clustering 
takes  place.  If  each  Job  is  considered  to  be  a  separate  specialty,  then 
the  average  cross-training  time  among  Jobs  within  specialties  is  zero. 
Furthermore,  as  the  number  o*  clusters  (specialties)  is  reduced,  this 
value  must  increase  to  the  extent  that  Jobs  are  not  identical. 

Yet  the  whole  purpose  of  clustering  Jobs  in  the  first  place  is  to 
provide  flexibility  to  management.  The  Air  Force  could  not  possibly 
maintain  separate  training  courses  for  every  Job.  Nor  could  it  move 
personnel  to  meet  changing  priorities  unless  Jobs  and  individuals  are 
clustered  into  a  limited  number  of  management  categories. 

It  is  clear  that  the  larger  the  number  of  specialties  (management 
categories),  the  smaller  will  be  the  average  cross-training  value.  At 
the  same  time,  it  also  is  clear  that  the  smaller  the  t, umber  of  specialties, 
the  easier  and  less  expensive  it  will  be  to  manage  the  personnel  system. 

What  is  needed  is  an  optimum  solution  for  every  possible  number  of 
clusters;  then  management  can  decide  on  the  correct  number  to  implement 
by  weighing  retraining  costs  against  the  cost  of  managing  a  given  number 
of  classification  categories  (specialties). 

But  how  do  we  obtain  an  optimum  solution  for  every  possible  number  of 
clusters?  The  most  direct  way  would  be  to  have  the  omputer  evaluate 
every  possible  configuration  at  every  possible  level.  For  example,  at 
the  50  cluster  level,  the  computer  would  systematically  evaluate  every 
possible  way  of  assigning  the  1,000  Jobs  into  50  specialties.  It  would 
then  report  that  particular  configuration  which  yields  the  smallest 
average  cross-training  time  imong  Jobs  within  clusters.  The  same  approach 
would  be  taken  at  the  U9  cluster  level,  and  so  on.  Is  such  an  approach 
feasible?  Definitely  not.  411  the  computers  in  the  world,  working  in 
perfect  harmony,  could  not  uegin  to  provide  the  solution  in  a  lifetime. 

(See  letter  in  Appendix)! 

Another  approach  must  be  found — one  which  approximates  an  optimum 
solution,  but  which  is  feasible  to  compute. 

It  is  at  this  point  that  the  concept  of  systematically  collapsing 
clusters  so  as  to  maximize  an  objective  function  comes  tc  mind.  The 
concept  is  extremely  simple,  and  can  be  described  in  terms  of  its  applica¬ 
tion  to  the  Job-grouping  problem. 
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First  we  must  define  our  objective  function,  which  in  this  case,  is 
the  average  cross-training  time  among  jobs  within  clusters  (specialties). 

We  begin  with  each  of  the  1,000  Jobs  in  a  separate  cluster.  At  this  stage 
the  value  of  the  objective  function  is  zero.  Next  we  have  the  computer 
evaluate  every  possible  way  of  reducing  the  numbe  of  clusters  from  1,000 
to  999.  For  each  of  the  1*99,500  alternatives  we  can  compute  the  average 
cross-training  time  among  Jobs  within  clusters. 

It  turns  out  that  the  computer  will  cluster  the  two  Jobs  having  the 
smallest  average  cross-training  time.  At  the  next  stage  we  have  the 
computer  evaluate  every  possible  way  of  reducing  the  number  of  clusters 
from  999  to  998,  through  collapsing  two  of  the  existing  clusters  into  a 
single  cluster.  It  may  do  this  by  placing  one  of  the  998  ungrouped  Jobs 
in  the  same  specialty  as  the  first  pair  clustered,  so  that  we  end  up  with 
a  three-job  cluster.  Or,  it  may  cluster  a  second  pair  of  similar  Jobs,  so 
that  we  end  up  with  two  clusters  containing  two  Jobs  and  with  each  of  the 
remaining  996  clusters  containing  a  single  Job.  All  1*98,501  possible  con¬ 
figurations  are  evaluated,  and  that  one  is  selected  which  yields  the  smallest 
value  of  our  objective  function.  This  process  of  reducing  the  number  of 
clusters  by  one  at  each  stage  is  continued,  until  all  Jobs  are  in  a  single 
cluster.  In  each  instance  all  possible  alternatives  involving  the  collapse 
of  two  existing  clusters  are  considered,  and  that  alternative  is  accepted 
which  is  evaluated  as  "best"  by  the  objective  function. 

Thus,  we  end  up  with  a  solution  for  each  possible  number  of  clusters. 

We  also  have  exact  information  concerning  the  average  cross-training  time 
among  Jobs  within  specialties  at  each  stage,  which  can  be  weighed  against 
management  costs  in  order  to  arrive  at  a  Judgment  concerning  the  optimum 
number  of  specialty  clusters  to  maintain. 

We  were  anxious  to  try  this  new  approach,  but  unfortunately  we  did 
not  have  a  matrix  of  cross-training  times  among  jobs.  However,  within  a 
few  months  Dr  Marion  E.  Hook  (Hook  &  Masser,  1962)  had  gathered  rank -order 
estimates  of  the  time  required  for  cross-training  among  98  existing  airman 
specialties.  A  complete  hierarchical  clustering  of  these  data  was  obtained 
using  the  MAX0F  model. 

Since  the  98  specialties  had  been  selected  from  1*0  career  fields,  we 
were  in  a  position  to  compare  results  of  the  MAX0F  clustering  solution  at 
the  1*0  group  stage  with  the  career  field  membership  of  these  specialties. 

We  found  the  average  cross-training  time  within  groups  identified  by  the 
MAX0F  Model  to  be  markedly  smalle'  than  average  cross-training  time  within 
the  official  career  field  groups.  We  were  encouraged  with  the  results,  since 
the  clustering  technique  appeared  to  operate  as  predicted. 
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IDENTIFYING  JOB  TYPES 


It  wasn't  long  before  we  found  another  application  of  the  grouping 
technique  which  proved  to  have  considerable  pay-off  to  the  Air  Force. 

The  Personnel  Research  Laboratory  had  been  asked  to  develop  improved 
methods  for  collecting,  analyzing,  and  reporting  information  describing 
enlisted  and  officer  Jobs. 

We  spent  the  first  year  studying  various  job  analysis  techniques  and 
Air  Force  needs  for  Job  information.  The  greatest  problem  was  concerned 
with  how  to  collect  information  in  a  form  so  that  it  could  be  quantified 
and  subjected  to  machine  analysis. 

Eventually  it  became  clear  that  a  task  inventory  procedure  had  greatest 
potential  for  staisfying  our  requirements.  Since  that  decision  was  made, 
we  have  conducted  a  series  of  studies  concerning  how  task  inventories  should 
be  constructed  and  administered,  and  how  the  resulting  information  should 
be  analyzed  and  reported. 

Our  procedures  for  constructing  inventories  are  relatively  straight¬ 
forward.  In  the  enlisted  area,  for  example,  the  instrument  is  simply  a 
list  of  all  the  significant  tasks  performed  by  individuals  working  in  a 
single  promotion  career  ladder.  That  is,  it  consists  of  the  tasks  per¬ 
formed  by  airmen  working  at  the  apprentice,  Journeyman,  supervisor,  and 
superintendent  levels  in  one  of  the  more  than  200  career  ladders  which  the 
Air  Force  has  established  for  management  control.  This  inventory  is 
administered  by  test  control  officers  to  individuals  working  in  the  career 
ladder  at  Air  Force  installations  throughout  the  world.  A  worker  is  asked 
to  check  those  tasks  which  he  performs  as  part  of  his  normal  Job,  and  to 
indicate  how  his  worktime  is  distributed  across  those  tasks.  He  also  fills 
in  a  background  information  section,  where  he  indicates  such  things  as  his 
base,  command,  grade,  time  on  the  job,  courses  taken  or  equipment  worked  on. 

The  completed  inventories  are  sent  to  the  Laboratory,  where  the  data 
are  keypunched  and  transferred  to  magnetic  tape.  Without  going  into  detail, 
let  me  simply  state  that  a  series  of  studies  has  indicated  that  we  get 
high-quality  information  using  these  instruments. 

Once  the  data  are  in  the  computer,  they  are  analyzed  by  fifteen  or 
twenty  programs  in  order  to  produce  reports  tailored  to  meet  the  needs  of 
various  using  agencies  (Morsh  &  Christal,  In  Press). 

One  program  is  a  general-purpose  information  retrieval  system.  It 
enables  the  investigator  to  produce  a  consolidated  description  of  the  work 
being  performed  by  any  specified  group  of  workers.  A  special  group  can  be 
identified  ir..  terms  of  values  on  as  many  as  nine  background  variables, 
through  use  of  a  series  of  "and"  and  "or"  statements.  For  example,  one 
might  ask  for  a  description  of  the  work  being  performed  by  airmen  who  have 
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been  in  the  Air  Force  less  than  two  years;  who  bypassed  the  basic  training 
course;  who  are  less  than  19  years  of  age;  who  failed  to  complete  high 
school;  and  who  are  working  in  overseas  jobs  in  the  Pacific  Air  Force 
Command. 

Figure  1  presents  the  top  portion  of  a  typical  consolidated  Job 
description.  This  one  describes  the  work  being  performed  by  39**  Journeymen 
medical  laboratory  technicians  working  in  hospitals  and  clinics  throughout 
the  world.  Motice  the  four  columns  of  numbers  printed  to  the  right  of  the 
task  statements.  The  first  column  indicates  the  per  cent  of  members  in 
this  group  performing  the  listed  task.  The  second  column  reports  the  per 
cent  of  worktime  spent  on  the  task  by  individuals  who  perform  it.  The 
third  column  presents  the  per  cent  of  the  entire  groups'  worktime  spent 
on  the  task.  This  third  column  is  the  main  element  of  the  description, 
since  it  accounts  for  the  worktime  of  all  cases.  Tasks  in  the  Job  descrip¬ 
tion  are  arranged  in  descending  order  based  on  the  magnitude  of  the  values 
in  this  column.  Thus,  "collect  blood  specimen  directly  from  patients"  is 
the  most  time-consuming  task  performed  by  Journeymen  laboratory  technicians. 
By  the  time  you  read  through  the  third  task,  you  have  accounted  for  U . 30 
per  cent  of  the  worktime  for  this  group.  This  is  seen  by  looking  at  the 
value  in  Column  k,  which  presents  the  cumulative  sum  of  the  values  in  Column 
3. 


Insert  Figure  1  about  here 


While  this  Job  description  is  an  excellent  statement  of  the  work 
performed  by  Journeymen  laboratory  technicians  as  a  group,  it  may  not  be 
an  accurate  description  of  what  any  one  man  does.  Commanders  of  local 
hospitals  and  clinics  can  engineer  Jobs  any  way  they  please  in  order  to 
accomplish  their  mission  effectively.  It  might  be  that  the  Jobs  in  larger 
hospitals  are  highly  specialized,  so  that  an  individual  worker  performs  only 
a  small  subset  of  the  tasks.  The  Air  Force  wanted  to  know  how  work  is 
organized  in  the  field.  They  wanted  to  identify  and  define  all  of  the  Job 
types  in  each  career  ladder,  and  find  out  where  each  Job  type  exists  and 
who  is  working  in  it. 

It  seemed  reasonable  that  if  we  had  a  detailed  description  of  the  work 
performed  by  each  individual  in  a  career  ladder,  there  should  be  a  way  to 
cluster  individuals  having  similar  Jobs,  Hopefully  this  could  be  accomplished 
using  the  MAXOF  Clustering  Model. 

The  first  requirement  was  to  develop  a  matrix  of  values  defining  the 
overlap  of  each  Job  with  every  other  Job.  Several  measures  of  Job  similari¬ 
ty  were  considered.  Two  of  these  potential  overlap  functions  reflected  the 
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extent  to  which  any  two  Jobs  contained  identical  tasks.  One  was  simply 
the  number  of  tasks  in  either  Job,  divided  into  the  number  of  tasks  common 
to  both  Jobs.  The  second  was  the  per  cent  of  tasks  in  Job  A  which  were 
also  in  Job  B,  averaged  with  the  number  of  tasks  in  Job  B  which  were  also 
in  Job  A.  Both  of  these  measures  of  task  overlap  were  later  discarded  in 
favor  of  a  value  indicating  common  worktime. 

This  common  worktime  value  is  obtained  for  a  pair  of  Jobs  by  summing 
the  smaller  of  the  two  time  values  associated  with  each  task  in  the 
inventory.  Thus  in  the  example  givtu  in  Figure  2,  the  common  worktime 
value  is  80  per  cent.  Notice  that  by  reallocating  20  per  cent  of  the  time 
values  on  either  of  the  two  descriptions,  one  can  perfectly  reproduce  the 
other. 


Insert  Figure  2  about  here 


Once  we  have  computed  a  matrix  of  overlap  values,  we  next  must  define 
our  objective  function,  or  the  decision  rule  to  be  used  in  the  grouping 
operation.  In  this  case,  we  decided  to  group  in  a  manner  that  maximizes 
retention  of  descriptive  accuracy.  Thus,  we  begin  with  a  separate  Job 
description  for  each  individual  in  our  sample.  At  this  stage  we  can  per¬ 
fectly  describe  the  worktime  of  all  workers.  Next,  we  evaluate  every  possible 
way  of  describing  the  worktime  of  all  workers  using  N-l  descriptions.  At 
the  end  of  this  stage  we  must  describe  the  Jobs  of  two  workers  with  a  single 
description.  To  the  extent  these  two  Jobs  are  not  exactly  identical,  this 
single  description  will  make  a  small  error  in  defining  the  worktime  of  the 
two  individuals.  It  can  be  shown  that  this  error  will  be  minimized  if  we 
select  the  two  Jobs  having  the  highest  common  worktime  value.  Thus,  we 
can  locate  the  first  two  Jobs  to  be  clustered  by  identifying  the  highest 
value  in  our  original  overlap  matrix.  The  composite  description  for  these 
two  Jobs  is  simply  an  average  of  the  worktime  on  each  task  in  the  inventory. 

At  the  second  stage  we  evaluate  every  possible  way  of  reducing  the 
number  of  descriptions  by  one.  The  possibilities  include  locating  a  third 
Job  similar  to  the  first  pair,  and  describing  all  three  with  a  single 
description,  or  finding  a  second  pair  of  similar  Jobs  to  be  defined  by  a 
composite  description.  The  process  is  continued,  defining  the  worktime  of 
all  cases  with  one  less  description  at  each  stage,  until  we  reach  the  last 
stage — where  we  attempt  to  define  all  Jobs  with  a  single  description. 

In  order  to  determine  the  number  of  Job  types  in  the  ladder,  we  normally 
work  backwards  through  the  solution.  Ordinarily,  we  can  quickly  eliminate 
the  one  description  stage  because  of  the  magnitude  of  the  error.  If  the 
error  is  too  large  at  the  twc-group  stage,  we  look  at  the  three-group  stage. 

We  proceed  in  this  manner  until  we  reach  a  point  where  we  cannot  tell  from 
the  error  term  alone  whether  the  clusters  being  merged  are  similar  enough 
to  be  considered  as  being  in  the  same  Job  type.  In  order  to  help  us  reach 


11.07 


a  decision  point,  we  have  the  computer  provide  us  with  consolidated 
descriptions  of  the  groups  being  merged  at  several  stages  around  the 
decision  point.  We  may  find,  for  example,  that  we  cannot  detect  meaningful 
differences  between  the  two  groups  being  merged  at  the  25-group  stage. 
However,  we  may  discover  that  in  order  to  reduce  the  number  of  Job-type 
descriptions  from  25  to  2U,  the  computer  merges  two  groups  which  are 
different  in  some  significant  respect.  If  so,  we  conclude  that  there  are 
25  significant  Job  types,  and  we  have  the  computer  publish  consolidated 
descriptions  of  the  work  in  each  Job  type. 

In  the  course  of  obtaining  a  complete  hierarchical  grouping  of  a 
2,000-Job  input  matrix,  the  computer  evaluates  U, 333, 333 ,000  possible 
configurations.  However,  problems  of  this  magnitude  are  now  accomplished 
on  a  routine  basis  without  difficulty.  Job-type  analyses  of  some  forty 
career  ladders  have  been  completed,  and  in  each  instance  the  results  have 
given  us  a  clear  picture  of  the  way  that  work  is  organized  in  the  field. 

For  example,  sixteen  clearly  defined  Job  types  were  identified  in  the 
medical  laboratory  career  ladder.  The  reader  will  find  the  top  tasks 
from  several  of  these  Job  type  descriptions  listed  in  the  appendix. 


CRITERION  GROUPING 

It  wasn't  long  before  we  discovered  a  new  application  of  the  MAXOF 
Clustering  Model.  In  personnel  classification  and  assignment,  a  primary 
goal  is  to  predict  performance  of  each  individual  in  a  technical  training 
course  associated  with  a  particular  Job  area.  Even  though  a  battery  of 
tests  is  routinely  administered  to  Air  Force  volunteers,  it  has  n^t  veen 
feasible  to  develop  and  utilize  a  separate  test  composite  for  pred.  ng 
the  success  of  each  individual  in  each  technical  course.  Attempts  h^.e 
been  made  to  group  related  courses  into  "families"  so  that  a  smaller 
number  of  predictor  composites  could  be  used. 

In  the  Air  Force,  highly  subjective  techniques  have  been  used  for 
grouping  courses  into  homogeneous  families  and  for  determining  the  aptitude 
composite  associated  with  each  family.  In  general,  interrelationships 
among  courses  have  been  estimated  by  intercorrelating  the  patterns  of 
predictor  validities  associated  with  each  school.  The  intercorrelation 
matrix  has  been  factor  analyzed,  and  schools  with  similar  factor  loadings 
have  been  grouped.  Finally,  weights  for  aptitude  composites  have  been 
estimated  by  averaging  validity  coefficients  for  the  predictor  tests 
against  schools  in  a  cluster. 
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It  should  be  noted  that  factor  analysis  is  not  appropriate  for  this 
type  of  clustering  problem,  for  it  is  not  our  goal  to  explain  the  common 
variance  in  terms  of  a  minimum  number  of  hypothetical  constructs.  Further¬ 
more,  the  rank  of  the  matrix  of  intercorrelations  among  technical  schools 
(which  can  only  be  estimated)  does  not  help  us  to  decide  on  the  optimal 
number  of  clusters.  Finally,  even  if  the  factor-analytic  appi each  did 
enable  us  to  assign  courses  into  homogeneous  groups,  we  still  would  need 
to  determine  the  weights  which  yield  simultaneous,  optimum  prediction  of 
all  criteria  within  clusters. 

After  reflecting  on  the  matter,  it  seemed  that  the  MAXOF  Clustering 
Model  might  be  applicable.  Details  of  the  system  which  was  finally  worked 
out  would  take  too  much  space  to  describe  in  this  paper.  However,  they 
are  spelled  out  in  a  Technical  Documentary  Report  (Bottenberg  &  Christal, 
1961)  which  is  available  upon  request.  Conceptually,  the  system  begins 
with  k  separate  least  squared  regression  equations — one  for  each  of  k 
schools.  A  computing  expression  has  been  developed  which  enables  the 
investigator  to  determine  the  overall  predictive  efficiency  obtained  using 
these  k  separate  equations.  As  the  first  step,  the  two  schools  are  clustered 
whose  associated  regression  equations  are  most  homogeneous,  and  a  single 
set  of  least  squares  weights  is  developed  for  simultaneously  predicting 
criterion  scores  in  both  schools.  Thus,  the  number  of  school  groups  and 
associated  equations  is  reduced  by  one.  The  process  of  reducing  the  number 
of  groups  and  associated  equations  by  one  at  each  step  is  continued  until 
the  one-group  stage  is  reached.  In  each  instance  all  alternatives  are 
considered,  and  that  alternative  is  selected  which  minimizes  loss  of 
overall  predictive  efficiency.  The  number  of  groups  and  equations  to  be 
retained  is  decided  by  weighing  predictive  efficiency  against  the  cost  of 
utilizing  a  given  number  of  prediction  composites. 

This  is  a  considerably  over-simplified  description  of  the  actual 
criterion  grouping  system.  For  example,  the  program  enables  the  investigator 
to  give  weight  to  training  costs,  personnel  quotas,  and  other  factors 
associated  with  a  particular  school.  That  is,  the  program  may  be  oriented 
toward  preserving  predictive  efficiency  for  those  schools  where  the  number 
of  students  and  the  training  costs  are  high  relative  to  other  schools. 

In  those  instances  where  the  criterion  means  and  variances  are  equal, 
computing  expressions  for  the  grouping  system  are  extremely  simple.  Under 
this  condition,  the  computer  program  can  easily  accomplish  a  complete 
hierarchical  grouping  of  nearly  a  thousand  criterion  situations,  using 
predictor  composites  based  on  as  many  as  a  hundred  and  fifty  variables. 

We  were  able  to  locate  criterion  information  and  classification  test 
scores  for  airmen  attending  sixty  Air  Force  technical  schools.  A  complete 
hierarchical  grouping  of  the  schools  was  accomplished.  The  results  revealed 
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that  reduction  of  the  number  of  predictor  equations  from  60  to  15  was 
associated  with  a  drop  in  overall  R2  from  .56  to  .50.  However,  as  the 
number  of  equations  was  reduced  from  15  to  1,  the  R2  dropped  from  .50 
to  .38. 


JUDGMENT  ANALYSIS 

In  1963,  investigators  at  the  Personnel  Research  Laboratory  were 
experiencing  remarkable  success  in  programming  a  computer  to  simulate 
the  actions  of  decision  makers  (Ward  &  Davis,  1963).  A  subject  was  required 
to  record  a  series  of  decisions  into  the  computer  via  the  console  typewriter. 
The  subject  made  each  decision  after  studying  relevant  information  displayed 
to  him  by  the  typewriter.  The  computer  was  programmed  to  capture  the  policy 
of  the  subject  in  the  form  of  an  equation  developed  with  the  fixed-X  multi¬ 
ple  linear  regression  model.  A  series  of  decisions  made  by  the  subject  was 
used  as  the  dependent  variable,  while  the  independent  variables  were 
generated  from  information  provided  to  the  subject  on  the  typewriter.  The 
policy  equation  was  then  cross  validated  against  a  second  series  of  decisions. 

At  that  time,  the  concept  of  policy  capturing  using  the  regression  model 
was  rather  novel.  However,  more  recently  we  have  found  policy-capturing  to 
be  a  powerful  approach  to  many  meaningful  operational  problems.  For  example, 
in  one  study  (Christal,  1965)  a  Hq  USAF  board  of  senior  officers  reveiwed 
descriptions  of  3,575  representative  officer  positions  and  made  decisions 
concerning  the  appropriate  grade  level  to  be  associated  with  each.  In  an 
effort  to  identify  the  factors  considered  by  these  board  members  in  making 
their  Judgments,  over  a  hundred  variables  were  hypothesized  and  evaluated. 
Eventually,  a  nine-predictor  equation  was  developed  which  accurately 
reproduced  the  board's  actions.  Subsequently,  this  equation  was  applied  by 
the  computer  to  determine  appropriate  grades  for  an  additional  10,000  officers, 
positions.  In  other  applications  the  model  has  been  used  to  develop  a 
mechanized  initial  assignment  system  for  airmen  which  duplicates  actions 
previously  performed  by  assignment  specialists.  A  study  is  planned  to  use 
this  technique  to  develop  a  reassignment  model  which  gives  appropriate 
consideration  to  Job  and  personnel  characteristics.  While  these  applications 
have  been  in  the  military  setting,  the  policy-capturing  model  might  be  used 
to  study  such  diverse  properties  as  the  quality  of  beefstock,  the  beauty 
of  pictures,  the  effectiveness  of  workers  or  the  quality  of  English  composi¬ 
tions.  (Christal,  1966) 

As  one  might  expect,  we  have  found  that  individuals  sometimes  differ 
quite  markedly  in  their  policies  concerning  a  particular  type  of  stimulus. 

For  example,  what  might  be  a  beautiful  picture  to  one  Judge  may  be  dull  to 
another.  And  what  might  be  an  excellent  composition  to  one  teacher  may 
appear  unacceptable  to  another.  The  problem,  of  course,  is  that  all 
judges  do  not  have  the  same  value  system.  We  find  this  problem  in  the  military 
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setting.  If  one  has  a  board  of  officers  Judge  the  relative  acceptability 
of  a  series  of  applicants  for  the  Air  Force  Academy,  for  example,  there 
will  be  considerable  disagreement.  Some  will  place  high  value  on  previous 
participation  in  high  school  sports  and  on  the  physical  characteristics  of 
applicants.  Others  will  tend  to  place  more  weight  on  academic  aptitudes. 

In  the  past,  even  though  we  might  find  interrater  agreement  to  be  low, 
we  have  simply  averaged  across  all  Judges  in  order  to  determine  final  values. 
However,  it  should  be  recognized  that  even  when  the  level  of  interrater 
agreement  among  an  entire  sample  of  Judges  is  low,  it  might  be  that  the 
Judges  could  be  divided  into  two  or  more  groups  within  each  of  which  there 
is  very  high  agreement.  Conventional  analysis  techniques  for  determining 
interrater  agreement  would  fail  to  detect  this  situation.  It  turns  out 
that  the  criterion  grouping  application  of  the  MAXOF  Clustering  Model  is 
ideal  for  studying  similarities  and  differences  in  rating  policies  (Christal, 
1963).  We  begin  with  a  separate  equation  for  each  Judge,  and  then  we 
cluster  Judges  with  similar  policies  as  measured  by  the  homogeneity  of 
their  associated  equations.  Sometimes  we  find  that  Judges  can  be  nicely 
clustered  into  two  or  three  policy  groups.  In  such  an  instance,  differences 
in  policies  are  pinpointed  for  arbitration. 

As  an  illustration,  there  was  a  group  of  ten  supervisors  in  the  personnel 
department  of  a  large  government-owned,  government -managed  research  labora¬ 
tory  who  had  been  arguing  about  promotion  standards  for  six  years.  Each 
year  they  had  met  for  three  days  to  discuss  the  matter,  but  without 
reaching  agreement.  Dr  Robert  Stephenson  of  the  U.S.  Naval  Ordinance  Test 
Station  worked  with  Dr  Ward  in  conducting  a  study  to  resolve  the  problem 
(Stephenson  &  Ward,  In  Press).  First  they  identified  112  items  which 
might  be  related  to  promotion  potential.  Next  each  supervisor  rated  the 
importance  of  each  item  for  evaluating  promotion  potential.  Following  a 
study  of  relevant  variables,  an  analysis  was  performed  in  which  the  position 
of  each  super /isor  was  plotted  as  a  point  in  multi-dimensional  space. 
"Unfortunately,"  report  the  authors,  "the  knowledge  of  how  similar  one's 
position  was  to  somebody  else's  position  did  not  really  help  the  members 
of  the  group  to  resolve  their  conflicting  opinions.  In  fact,  the  analysis 
tended  to  focus  attention  on  people  relationships  (like  'How  similar  am  I 
to  the  boss.')  rather  than  policy  differences." 

Next  Stephenson  and  Ward  tried  the  "JAN"  technique  (Christal,  1963), 
which  is  nothing  more  than  a  combination  of  policy-capturing  and  the  MAXOF 
Clustering  Model.  First  the  investigators  described  a  sample  of  potential 
pr<  lotees  in  terms  of  their  score  values  on  relevant  factors.  Then  each  of 
the  ten  supervisors  was  asked  to  rank  the  entire  sample  in  terms  of  Judged 
merit.  The  policy  of  each  supervisor  was  captured  using  the  multiple-linear 
regression  model.  The  supervisors  were  then  clustered  in  terms  of  the 
homogeneity  of  their  equations,  using  the  MAXOF  Model.  Three  distinctive 
policy  groups  were  identified,  and  three  associated  joint-policy  equations 


were  developed.  These  three  equations  were  applied  to  rank  the  applicant 
sample,  and  the  supervisors  were  asked  to  discuss  and  resolve  differences 
in  the  rank  positions  of  cases  resulting  from  application  of  these  three 
equations.  It  is  interesting  to  note  that  the  supervisors  spent  more  time 
resolving  these  differences  than  they  did  in  making  their  original  rankings. 
However,  as  a  result  of  this  undertaking  the  supervisors  began  to  understand 
each  other's  positions,  and  found  compromise  possible.  Ultimately,  a 
compromise  ranking  was  arrived  at  for  each  controversial  case.  A  single 
new  equation  was  developed  which  produced  an  R^  of  .932.  This  is  almost  un¬ 
believable,  when  one  considers  that  the  best  single  overall  equation  which 
could  be  attained  before  arbitration  produced  an  R“-  of  only  . U82 ,  and  that 
the  best  equation  for  an  individual  supervisor  produced  an  R^  of  only  .8^8. 
The  authors  concluded  that  the  JAN  technique  was  doubly  successful.  The 
supervisors  gained  an  understanding  of  each  others  positions,  and  they 
also  reached  agreement  on  a  matter  over  which  they  had  been  fighting  for 
six  years. 


BIOLOGICAL  TAXONOMIES 

As  mentioned  previously,  the  MAXOF  Clustering  Model  is  ideally  suited 
for  establishing  biological  taxonomies.  It  yields  a  completely-nested 
hierarchical  structure  based  upon  optimization  of  a  criterion  established 
by  the  investigator.  The  MAXOF  Model  was  used  by  a  New  York  botonist 
(unpublished  study)  to  establish  a  taxonomy  of  Latin  American  tapioca 
plants.  The  model  is  now  being  applied  to  cluster  a  group  of  tropical 
fish  in  terms  of  the  similarity  of  their  eating  habits.  In  this  instance, 
the  problem  turns  out  to  be  identical  to  the  Job-typing  problem.  In  place 
of  a  Job  description  reporting  the  "per  cent  worktime"  on  each  of  N  tasks, 
we  compute  a  "stomach  content"  description,  reporting  the  per  cent  of  total 
stomach  content  accounted  for  by  each  of  N  foods.  Instead  of  an  input 
matrix  of  common  worktirae  values,  we  have  an  input  matrix  of  common  food 
values.  If  the  volume  of  food  were  considered  to  be  a  relevant  factor, 
then  a  matrix  of  d^s  could  be  generated  as  input  which  would  give  weight 
to  differences  in  volume  as  well  as  types  of  food  consumed. 


INTEREST  AREAS  AND 
DOCUMENT  GROUPING 

The  MAXOF  Clustering  Model  was  used  to  group  interest  areas  displayed 
by  scientists  at  the  Personnel  Research  Laboratory.  Results  of  this  study 
(Tomlinson,  1965)  are  reported  in  Figure  3.  One  of  the  advantages  in 
obtaining  a  completely-nested  hierarchical  solution  is  revealed  in  this 
figure.  It  is  possible  to  determine  a  particular  sequencing  of  the  items 
being  clustered,  such  that  items  appearing  in  any  cluster  at  say  stage  are 
listed  next  to  one  another.  Thus  in  Figure  3  the  reader  can  view  67  levels 
of  the  hierarchical  solution. 
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After  clustering  interest  areas,  this  investigator  obtained  a  transpose 
of  the  original  matrix  and  clustered  the  scientists  in  terms  of  the  similarity 
of  their  reading  interests.  Both  solutions  were  accepted  by  personnel  working 
in  the  Laboratory  as  being  a  true  representation  of  reality. 

In  a  somewhat  related  study,  an  investigator  at  the  Systems  Development 
Corporation  in  California  reported  (unpublished  study)  that  the  MAXOF  Model 
turned  out  to  be  nearly  ideal  as  a  basis  for  a  mixed  document  and  word  grouping 
approach  to  be  used  in  document  storage  and  retrieval. 


MISCELLANEOUS  APPLICATIONS 

The  MAXOF  Clustering  Model  has  been  used  in  several  profile  analysis 
studies.  For  example,  it  has  been  used  to  cluster  subjects  in  terms  of 
their  test  profiles  (Ward  4  Hook,  1963).  The  model  also  has  been  used  to 
group  psychiatric  patients  in  terms  of  the  similarity  of  their  profiles 
on  personal  history,  socio-economic,  and  other  variables  considered  relevant 
to  diagnosis  and  prognosis. 

In  most  studies  of  this  type,  some  form  of  a  distance  function  is  used 
as  a  measure  of  similarity.  There  is  no  problem  in  applying  the  MAXOF 
Clustering  Model  to  group  things  or  people  so  as  to  minimize  the  distances 
or  squared  distances  among  items  within  clusters.  However,  distance  values 
are  at  best  rather  ambiguous  statements,  being  affected  by  the  number,  types, 
and  nature  of  variables  used  in  their  computation.  Distance  functions 
should  be  avoided  when  more  relevant  and  understandable  measures  of  similarity 
can  be  utilized.  Nevertheless,  there  are  occasions  when  more  meaningful  values 
cannot  be  defined.  This  being  the  case,  the  investigator  should  at  least 
excercise  some  control  over  the  contribution  of  variables  to  the  computed 
distance  values.  One  approach  would  be  to  avoid  geometric  distances  altogether, 
and  to  substitute  measures  of  perceived  distances.  A  group  of  experts  in 
the  discipline  area  could  be  provided  with  profile  descriptions  for  a  sub¬ 
sample  of  the  things  or  people  to  be  clustered  and  aBked  to  make  direct 
Judgments  of  the  distances  among  them.  The  fixed-X  multiple  linear  regression 
model  co  Id  then  be  employed  to  determine  how  difference  scores  on  th-i 
descriptive  variables  must  be  weighed  in  order  to  implement  the  policy  of 
these  experts.  This  equation  could  be  applied  to  determine  the  perceived 
distances  among  the  entire  sample  of  things  or  people  to  be  grouped.  Using 
this  input  matrix,  clusters  could  be  defined  which  are  likely  to  have  face 
validity,  since  they  contain  items  perceived  as  being  close  to  one  another 
by  experts  in  the  discipline  area. 
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PROGRAM  DESCRIPTIONS 


The  Personnel  Research  Laboratory  has  two  sets  of  computer  programs 
available  for  applying  the  MAXOF  Clustering  Model.  The  most  elaborate 
set  contains  a  large  variety  of  options  for  computing  input  matrices, 
clustering,  and  generating  reports  of  results.  This  systems  package 
is  designed  for  execution  on  an  IEM  7040  computer  which  has  a  32K  core 
memory,  a  1301  disk  file,  six  addressable  tape  units,  an  on-line  lL02 
reader-punch,  an  on-line  1403  printer,  and  an  inquiry  station.  A  few  of 
the  routines  in  this  package  will  be  described  below.  For  investigators 
who  do  not  have  access  to  a  computer  with  disk  storage,  a  special  set  of 
programs  has  been  written  vhich  will  accomplish  profile,  criterion,  and 
Job  clustering  on  a  smaller  scale.  Input  limitations  are  a  function  of 
core  size. 

The  profile  clustering  program  permits  grouping  of  1,000  cases.  Input 
is  normally  from  punched  cards,  and  may  include  up  to  928  words  of  background 
and  history  information  on  each  case  in  addition  to  the  profile  data.  Profile 
data  may  consist  of  score  values  on  up  to  928  variables.  Values  on  a 
particular  profile  variable  must  fit  into  a  field  of  eight  columns,  including 
the  sign  end  decimal  point,  if  required.  The  system  reads  and  edits  input 
data,  assigns  case  numbers,  and  writes  data  on  tape.  Twelve  options  are 
available  for  computing  a  matrix  of  similarities  among  profiles.  These 
are  defined  in  the  appendix.  The  first  will  be  recognized  as  the  d2's 
computed  from  raw  scores.  The  second  d2's  computed  from  standardized 
scores.  The  standardization  routine  is  part  of  the  basic  program  and  is 
accomplished  automatically  if  option  2  is  selected.  Option  3  permits  the 
investigator  to  introduce  weights  to  be  applied  to  raw  score  variables. 

It  results  in  d2's  computed  from  weighted  raw  scores.  Option  1*  produces 
d2's  computed  from  weighted  stnadardized  scores.  Again,  the  investigator 
provides  the  weights.  Options  5  through  8  correspond  to  options  1  through  4, 
except  that  values  in  the  latter  matrices  are  the  positive  square  roots 
of  the  values  in  the  former  matrices.  Thus  they  might  be  defined  as  being 
d's  rather  than  d2's.  Options  9  through  12  produce  summations  of  absolute 
differences  of  (a)  raw  scores,  (b)  standard  scores,  (c)  weighted  raw  scores, 
and  (d)  weighted  standard  scores,  respectively. 

Once  the  selected  input  matrix  has  been  computed,  iJ  may  be  written  on 
tape  or  stored  on  disks  ready  for  immediate  grouping,  rhe  function  of  the 
grouping  program  is  to  combine  or  "collapse"  two  rows  of  the  matrix  at  a 
time  until  only  one  final  row  remains.  This  collapsing  is  guided  by  an 
"option"  or  objective  function  selected  by  the  user.  A  total  of  six  pre¬ 
programmed  options  are  available.  The  prog,  am  also  provides  a  way  for  the 
user  to  code  and  insert  his  own  objective  fu'.ction.  Definition  of  the  pre¬ 
programmed  options  are  given  in  the  appendix.  Ordinarily,  profile  analysis 
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is  accomplished  using  the  maximizing  function  associated  with  options  1 
or  2.  Thus  in  option  1,  groups  are  collapsed  so  as  to  maximize  the  average 
within-group  overlap  at  each  stage.  In  option  2,  which  we  have  found  best 
for  most  purposes  except  criterion  grouping,  the  program  simply  collapses 
the  two  groups  whose  members  are  most  similar;  that  is,  for  which  the 
average  pair-wise  between-groups  member  overlap  is  highest. 

After  the  grouping  has  been  accomplished,  the  program  enables  the 
investigator  to  publish  several  types  of  tables  displaying  the  results. 
First,  one  can  obtain  a  group  profile  description  for  any  group  existing 
at  any  stage.  The  format  of  a  group  profile  description  is  illustrated  in 
the  appendix.  Programs  also  are  available  for  describing  individuals  In 
any  group  in  terms  of  the  history  and  background  data.  One  can  request 
distributions,  means  and  standard  deviations  for  selected  background 
variables.  Many  other  types  of  displays  are  possible  which  cannot  be 
described  due  to  space  limitations.  It  is  suggested  that  anyone  desiring 
more  information  about  these  programs  write  to  one  of  the  authors. 

Input  to  the  criterion  grouping  program  is  in  the  form  of  beta  weights 
and  validity  coefficients.  The  appendix  includes  a  note  vritten  by 
Dr  Robert  A.  Bottenberg  which  demonstrates  how  an  input  matrix  of  "pair¬ 
wise  loss  values"  can  be  grouped  with  option  1  of  the  general  grouping 
program  in  a  manner  which  minimizes  over-all  loss  in  predictive  efficiency. 
In  the  case  of  equal  criterion  means  and  standard  deviations,  the  program 
can  handle  nearly  a  thousand  input  equations,  each  involving  a  common  set 
of  not  more  than  150  predictors.  Outputs  include  (a)  the  overall  R2  at 
each  stage,  (b)  the  set  of  raw  score  regression  weights  for  the  new  group 
formed  at  each  stage,  and  (c)  the  set  of  validity  coefficients  for  the  new 
group  formed  at  each  stage.  Again,  several  other  outputs  are  available, 
which  can  be  described  to  requestors. 

The  task  survey  analysis  programs  are  by  far  the  most  elaborate,  and 
cannot  be  described  in  this  paper. 


SUMMARY 

A  hierarchical  clustering  technique  has  been  described  which  is  designed 
to  group  people  or  things  into  mutually  exclusive  categories.  The  input 
matrix  of  overlap  values  may  take  any  form  which  the  investigator  selects 
as  representing  reality. 

The  model  begins  with  each  of  the  N  objects  in  a  separate  group.  The 
number  of  groups  is  reduced  by  one  at  each  stage,  until  all  objects  are  in 
a  single  group.  Choice  of  the  two  groups  to  be  collapsed  at  a  given  stage 
is  determined  by  considering  all  possibilities  and  selecting  that  one  which 
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best  satisfies  an  objective  function  previously  established  by  the 
investigator.  Thus,  the  m-del  groups  objects  into  every  possible  number 
of  mutually  exclusive  clusters,  from  N  to  1.  The  investigator  decides 
on  the  appropriate  number  of  clusters  to  report  by  considering  relevant 
factors. 

Applications  of  the  model  described  in  the  paper  include:  (a)  grouping 
Jobs  in  a  manner  which  minimizes  average  cross-training  time  among  Jobs 
within  clusters;  (b)  defining  a  large  number  of  Jobs  with  a  fewer  number  of 
consolidated  Job  descriptions  in  a  manner  which  maintains  maximum  descrip¬ 
tive  accuracy;  (c)  clustering  technical  schools  into  families  and  producing 
associated  prediction  equations  so  as  to  maintain  maximum  predictive 
efficiency;  (d)  clustering  Judges  in  terms  of  the  homogeneity  of  their 
policy  equations,  and  producing  composite  equations  for  each  group  accepted; 
(e)  establishing  a  taxonomy  of  Latin  American  tapioca  plants;  (f)  grouping 
tropical  fish  in  terms  of  the  similarity  of  their  eating  habits;  (g) 
grouping  reading  areas;  (h)  grouping  scientists  in  terras  of  the  similarity 
of  their  reading  interests;  (i)  document  grouping;  (j)  task  clustering, 
and  (k)  profile  analysis. 
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APPENDIX 


Contents  of  Appendix  are  as  follows,  listed  in  order  of  appearance: 

1.  Requirements  for  computer  evaluation  of  objective  function  on 
pooling  a  large  number  of  objects. 

2.  Definition  of  input  matrices  for  profile  grouping. 

3.  Format  for  description  of  group  profile. 

U.  Definition  of  grouping  process  and  collapsing  formulas. 

5.  Conditions  for  use  of  options  U  and  6  for  grouping  in  terms 
of  square  multiple  correlation  coefficients. 

6.  Listing  of  top  tasks  from  six  medical  Laboratory  Technician  Job 
type  descriptions. 
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vusjicT  Requirements  for  Computer  Evaluation  of  Objective  Function/on  Pooling  a 
Large  Number  of  Objects 


,0  PRB  (Dr.  Christal) 

1.  The  most  general  statement  of  the  problem  i6  to  determine  an  estimate 
of  machine  time  required  to  perform  the  evaluations  for  all  possible  group¬ 
ings  of  1,000  objects.  The  enumeration  of  all  such  groupings  appears  to  be 
a  difficult  problem.  Therefore,  what  follows  is  limited  to  the  enumeration 
of  a  subset  of  these  groupings.  Any  time  estimate  made  on  the  basis  of 
evaluating  the  objective  function  for  all  groupings  in  this  subset  will  be 
a  gross  underestimate  of  the  time  requirement  for  grouping  in  all  possible 
ways.  The  subset  in  question  contains  any  grouping  such  that:  a.  there  are 
500  groups,  and  b.  there  are  exactly  two  objects  in  each  group.  Define  this 
subset  as  S. 


2.  To  enumerate  the  groupings  of  1,000  objects  into  500  partitions  of  two 
objects  each,  first  consider  the  simple  problem  of  enumerating  the  group¬ 
ings  of  4  objects  into  two  partitions  of  two  objects  each.  There  are  three 
such  groupings,  (1,2:3j4),  (1,3:2, 4),  and  (1,4:2, 3)*  Note  that  the  lead 
element  in  the  first  partition  is  the  i.d.  number  1.,  and  we  can  make  thi6 
true  for  any  arbitrary  arrangement  of  i.d.  numbers  into  partitions  by  reorder¬ 
ing  the  object  i.d.  numbers  within  a  partition  and  the  order  of  the  partitions 
without  in  any  way  altering  the  unique  grouping.  Next  consider  grouping  6 
objects  into  three  partitions  of  two  objects  each.  Again  let  the  i.d.  number 
1  be  the  lead  element  of  the  first  partition.  There  are  five  other  i.d. 
numbers  which  can  be  used  to  fill  out  the  first  partition.  Then,  as  in  the 
case  of  grouping  four  objects  into  two  partitions  of  two  each,  there  are 
three  ways  of  partitioning  the  remaining  four  objects  for  each  of  the  five 
ways  of  completing  the  first  partition.  3o  there  are  5  x  3  *  15  ways  of 
grouping  6  objects  into  three  partitions  of  two  objects  each.  The  same 
general  argument  holds  for  the  problem  of  putting  8  objects  into  four  parti¬ 
tions  of  two  each.  There  are  7  i.d.  numbers  which  can  be  used  to  complete 
the  first  partition  after  assigning  i.d.  number  1  to  the  lead  element  of  the 
first  partition.  For  each  of  the  7  ways  of  completing  partition  1,  there 

are  5  x  3  *  15  ways  of  assigning  the  remaining  6  objects  to  three  partitions 
of  two  objects  each.  It  can  be  seen  by  induction  that  for  grouping  1,000 
objects  into  500  partitions  of  tyo^ objects  each,  there  argoSW  x  997  x  ...  x  3  x  1 
different  ways,  •  ( 1000 l/( 500! x2'00) )  «  approximately  1012°9. 

3.  There  are  approximately  3  x  10  7  seconds  per  year.  Assuming  that  the  cycle 
time  for  a  computer  is  about  10'°,  and  that  one  evaluation  of  the  objective 
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function  could  be  performed  on  eacty  machine  cycle,  then  e  computer  operating 
for  one  year  could  compute  3  x  1CAD  evaluations.  If  the  toted  number  of 
evaluations  were  split  up  so  that  separate  computers  could  perform  different 
subsets  of  the  evaluations,  it  would  require  1(7289/3  x  iqIo  •  approximately 
101272  computers  running  continuously  for  one  year  to  evaluate  all  the  objec¬ 
tive  functions  for  the  subset  S. 

7  <>  (h  ^  C{  u 

ROBERT  A.  BOTTENBERG  J 

Chief,  Mathematical  and  statistical 
Analysis  Branch 
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DEFINITION  OF  INPUT  MATRICES  FOR  PROFILE  GROUPING 

A.  Input .  (l)  A  tape  file  or  deck  of  cards  containing  scores  on 

a  set  p  variables  for  each  of  n  cases. 

(2)  A  set  of  p  weights,  one  for  each  of  the  p  variables. 
When  weights  are  unspecified,  they  are  assumed  to  be  equal  to  1.00. 

B.  Functions  of  the  Program.  This  program  computes  a  matrix  of 
profile  similarities ,  ready  for  input  in  the  PRL  grouping  programs. 

The  input  matrix  is  symmetric  and  contains  n2  values.  The  elements  of 
the  matrix  are  computed  according  to  one  of  the  twelve  computing 
expressions  listed  below,  depending  upon  the  option  selected. 

C .  Definition  of  Terms. 

Let  xiJ  *  score  on  variable  J  for  person  i,  or  the  Jth  element 
in  the  ith  record.  1000,  i*1000 
n  *  the  upper  value  for  i. 

p  *  the  upper  value  for  J . 

Ak  =  weight  to  be  applied  to  variable  k. 

0^  =  standard  deviation  of  variable  k. 


D.  Computing  expressions.  Twelve  options  are  programmed 
as  follows: 


d  using  standardized  scores 


3. 


-  ki  -  xj*)  '] 

? 

=  d  computed  from  weighted  raw  scores 


d  computed  from  weighted  standard  scores 


5.  DU.j 


6.  DUS 


2 

=  nJ  DU± 


ij  n  iJ 


DUS. 


7.  DW. 


ij 


xKj 


8.  DWS . 


ij  M  ij 


DWS2 


9.  ADU.j 


P 

E 

k=l 


X  -  X 
ik  jk 


Summation  cf  absolute  differences  in  raw  scores 
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10.  ADl'Sjj 


P 

Z 

k=l 


Summation  of  absolute  differences  of  standard  scores 


11.  M)WiJ  = 


Summation  of  weighted  absolute  differences  of 
raw  scores 


12.  ADWS.j 


=  Summation  of  weighted  absolute  differences  in 
standardized  scores. 

NOTE:  The  general  program  is  written  so  that  if  options  1  and  5 
are  required  simultaneously,  the  program  will  obtain  both  matricies  at 
one  time;  similarity  for  options  2  and  6,  3  and  7,  and  U  and  8. 
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DEFINITION  OF  GROUPING  PROGRAM  AND 
COLLAPSING  FORMULAS 

FUXCT.OX:  To  combine  rows  of  matrix  (stored  on  disk),  two  at  a  time,  until 

only  one  final  value  remains.  This  process  is  called  "collapsing 
the  matrix."  Two  methods  are  available  for  selection  of  the 
sequence  of  rows  to  be  combined: 

1.  MAXIMIZING  Process:  The  largest  value  V-j  in  the 

matrix  is  searched  for  each  time,  and  when  found,  the 

indices  of  its  position  in  the  matrix  (i  and  j)  become  the 

two  rows  to  be  combined.  Thus,  if  the  numerically 

rd 

largest  matrix  element  is  in  123  cell  of  row  45,  then 
row  123  will  be  combined  with  row  45. 

2.  MINIMIZING  Process:  Similar  to  the  maximizing  process 
except  the  smallest  value  is  searched  for  each  time.  Once 
either  the  minimizing  or  maximizing  process  is  selected 
(via  control  card)  it  remains  in  effect  for  all  collapses  of 
the  entire  matrix. 

The  value  selected  to  determine  each  collapse  is  called  the  BEST 
value  for  that  collapse;  each  collapse  is  called  a  STAGE.  The 
two  row  numbers  are  called  the  IBEST  and  JBEST  indices  for 
that  STAGE  .  The  rows  for  each  STAGE  are  combined  together 
according  to  a  pre-determined  formula.  It  can  be  shown  that 
after  a  combination,  no  value  can  be  generated  which  is  greater 
than  BEST  for  that  stage  if  maximizing,  or  smaller  than  BEST 
if  minimizing.  The  row  indicated  by  the  larger  index  is  always 
collapsed  into  the  row  with  the  smaller  index.  Hence,  if  BEST 
is  found  at  123  and  45,  then  the  new  values  generated  will  be 
restored  into  row  45,  and  row  123  will  be  considered  deleted 
from  the  matrix.  If  the  matrix  was  m  x  m  to  start,  then  the 
first  collapse  is  called  STAGE  (m  -  1),  the  next  is  calico 
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STAGE  (  m  -  2),  down  to  stage  1.  The  original  matrix  is  de¬ 
stroyed  in  the  process. 

The  original  matrix  normally  was  created  by  one  of  the  OVERLAP 
programs.  Its  values  really  form  a  "triangular"  matrix  but  it 
was  found  that  by  reflecting  the  values  on  the  other  side  of  the 
diagonal,  that  the  grouping  program  could  be  made  to  operate 
very  rapidly.  This  is  done  by  a  "delayed  updating"  process 
developed  in  contract  AF  4l-(609)-1982.  Details  will  not  be 
repeated  here.  Actually,  a  table  of  BEST  values  is  maintained 
in  core  so  as  to  avoid  searching  the  entire  matrix  on  dish  for 
each  collapse.  The  BEST  table,  the  generated  weights  (number 
of  rows  combined  into  a  given  row)  and  order  of  collapses  is 
also  maintained  by  the  program  and  used  as  part  of  the  technique. 

COLLAPSING  FORMULAS: 

The  user  must  choose  one  of  these  and  punch  its  identification 
number  on  a  control  card.  The  entire  collapse  of  the  total 
matrix  then  occurs  according  to  the  chosen  option.  In  all  the 
below: 

i  =  Lower  numbered  row  of  any  pair  of  rows  being 
combined. 

New  values  are  restored  into  row  i. 
j  =  Higher  numbered  row  of  any  pair  of  rows  being 
combined. 

Row  j  is  then  deleted  from  matrix. 
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Successive  values  in  a  row,  where  K  =  1,  2  .... 
m  and  m  is  order  of  the  matrix,  except  no  value 
is  computed  for  the  diagonal  element  k  =  i. 

Old  values  obtained  from  row  i. 

Old  values  obtained  from  row  j. 


V'ki=  New  value  due  to  the  combination  and  restored 
into  position  k  or  row  i. 

Wx=  Weight  of  a  row,  where  x  can  be  i,  j,  or  k 
depending  on  the  option  selected. 

At  the  final  stage  of  collapse,  the  weight  value 
will  be  equal  to  the  sum  of  weight  row  of  the 
matrix. 

Collapse  Option  1 

v’kf  vk,  ^i  +  wk>  +  vkj  OVj  +  wk>  -  vij  wk 


wi+wj+wk 


Collapse  Option  2 


V\  ,  =  V,  •  W  T  V..  w 
ki  ki  i  kj  ) 

wt  +  W. 

Collapse  Option  3  (when  MAXIMIZING) 

V'ki  =  larger  of  VR.  and  V 

Collapse  Option  3  (when  MINIMIZING) 

V’ki  =  smaller  of  Vkl  and  Vkj 
Collapse  Option  4 

V'ki  =  A  value  generated  by  user  written  code  incorporated 
in  the  GROUP  program. 

In  all  options,  the  new  weight  of  row  i  (designated  win  be  the 

sum  of  the  old  W.  plus  W.. 

i  j 
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srzcia l  ca lculations 

TASK  INVENTORY  and/or  PROFILE  ANALYSIS 

AVERAGE  WITHIN  =  V'.. 

11 

V\<  =  V,.-  \\f  T  V..  W:2  +  2V. .  W.  W. 

**  “  ^  JJ  J  1J  1  J 

(Wi  -r  w;)2 


WHERE: 


V.:  was  the  previous  "average  within"  for  all  rows  collapsed 

into  row  i  (usually,  starts  at  100%). 

V..  is  like  V  except  for  row  j. 

JJ  ii 

V  .  is  BEST,  the  value  used  as  the  criteria  to  select  these 
ij 

two  rows  for  combination. 

\\r  and  Wj  are  the  weights  of  the  respective  rows  before 
combining. 


REGRESSION  EQUATION  ANALYSIS 


When  collapse  Option  1  is  utilized,  a  value  will  be  computed. 

9 

For  the  initial  collapse  R  will  be  computed  in  OVLAP1  by  the 

o 

formula: 


I  Wj  rj. 


2 

Rfr  -  — 

°  g 


Where  was  a  calculated  value  of  the  squared  multiple 
correlation  coefficient, 

2 

For  every  collapse  thereafter,  R^  will  be  computed  by  the 


formula: 


Rk  =  Rk*i  -  hi 
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Where  R2  is  the  overall  R2  at  stage  K  and  V..  is  the  matrix 
k  ‘J 

clement  located  by  the  1  and  j  indices. 

2 

The  value  is  the  output  in  the  place  of  AVERAGE  within. 


CONDITIONS  FOR  USE  OF  OPTIONS  4  AND  6 
FOR  GROUPING  IN  TERMS  OF  SQUARED 
MULTIPLE  CORRELATION  COEFFICIENTS 


1.  General .  Assume  an  input  matrix,  V,  axiata.  Both  optiona 
4  and  6  ■ill  uaa  V,  aaak  for  tha  smallest  valua  in  V,  and  than 
updata.  Tha  matrix  V  ia  aaaumad  to  ba  symmetric.  If  tha 
minimum  value  in  V  ia  tha  element  Vjj  mhere  i  ia  laaa  than  j, 

than  rom  and  column  i  are  updated  and  tha  existing  roa  and 
column  j  mill  ba  diaregardad  in  subaaquent  oparationa.  After 
tha  elementa  in  rom  and  column  i  have  been  updated,  tha  maight 
ia  updated  and  tha  exiating  maight  mj  ia  subaaquantly 

diaregardad.  Tha  expreaaion  for  updating  element  k  in  column 
i  dapanda  on  tha  option.  For  option  4, 

vki  *  (wik“i  *  vjk*J^/^*i  4  ■j)#  mhera  i , j  identify  tha  poaition 
of  the  amallaat  element,  vjj  in  V; 

for  option  6, 

vki  *  -vik(«i  ♦  «k)  ♦  vJk(«J  4  *k)  -  ''i^k)/*"!  ♦  mj  ♦  mk), 
mhere  i,j  identifies  the  minimum  in  matrix  V. 

For  both  optiona,  tha  updated  m^  ia  given  by  m[  *  m^  ♦  *j. 

2.  Assumptions. 

a.  Proportionality  of  sums  of  squares  and  cross-products 
of  predictor  matrices  betmeen  the  initial  groups.  Equality  of 
predictor  intejrcorrelation  matrixes  for  initial  groups  ia 
necessary  but  not  sufficient,  since  the  solution  for  a  set  of 
beta  meights  in  combined  groups  mould  involve  the  predictor 
correlation  matrix  for  the  rjmbined  group,  and  even  though 
these  matrices  are  equal  for  the  separate  groups,  the  combined 
group  predictor  correlation  matrix  mill  not  in  general  be  a 
meighted  sum  of  the  separate  matrices  unless  the  sums  of 
squares  and  cross-products  matrices  for  separate  groups  are 
proportional.  Proportionality  also  implies  that  the  predictor 
mean  for  a  given  variable  is  constant  from  group  to  group, 
similarly  for  the  predictor  s.d. 

b.  Equality  of  criterion  variable  means  across  initial 
groups. 

c.  Equality  of  criterion  variable  s.d.  across  initial 
groups . 
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3.  Dsf lnltlon». 

a.  B| ,  a  row  vector  of  beta  weights  (standard  partial 

■eights)  in  which  the  (p)th  element  is  the  weight  for  predic¬ 
tor  p  in  initial  group  i. 

b.  By  a  Matrix  in  which  the  rows  are  the  Bj. 

c.  Tj,  »  column  vector  of  validity  coefficients  in  which 

the  (p)th  element  ia  the  validity  of  predictor  p  for  the 
criterion  in  initial  group  j. 

d.  T,  a  matrix  in  which  the  columns  are  the  Tj. 

a*  R,  a  square  matrix,  BT.  R  is  theoretically  symmetrical 
but  will,  in  general,  fail  to  be  symmetrical  due  to  inaccuracy 
in  solving  for  the  0^ . 

f,  wp  weights.  Initial  values  of  the  W£  are  set  at  the 
corresponding  Nj  (number  of  criterion  obaervetions )  for  option 
6  with  unequal  and  set  at  1  when  option  6  is  used  for  an 
equal  N  case  and  for  option  4. 


g.  w,  the  sum  over  i  of  initial  values  of  wi# 


h.  V,  a  symmetric  matrix  in  which  the  element  vjj  is 
obtained  from  elements  rjj,  rjj,  rjj,  and  rj£  of  matrix  R. 
wij  s  (l/w)(wiwj(rn  ♦  r j j  -  r*j  -  rji))/(wi  ♦  wj),  where 
the  veluas  of  wj  and  wj  are  the  initial  values. 


4.  Proof  that  option  6  combines  groups  so  as  to  minimize 
loss  in  over-all  predictive  efficiency,  given  the  input  matrix 
V  and  the  updating  procedure  described  in  paragraph  1. 

Method:  (l)  Assume  that  grouping  has  occurrsd  and  that  at  the 

end  of  this  stage  the  element  Vjj  is  found  to  be  best  (minimum) 


in  the  updated  matrix;  (2)  That  Vjj,  vik,  and  Vjk  are  the 

over-all  loss  in  predictive  efficiency  when  the  i,j  cluster 
is  combined,  when  the  i,k  cluster  is  combined,  end  when  the 
j,k  cluster  is  combined  respectively;  (3)  Then  to  show  thot 
v£j  es  given  by  the  updating  expression  will  be  the  over-all 


loss  in  predictive  efficiency  when  the  i,j  cluster  is  combined 
with  cluster  k;  and  (4)  To  ehow  that  the  initial  values  in  the 
V  matrix  represent  the  over-all  loss  in  predictive  efficiency 
when  two  of  the  initial  groups  are  combined.  If  (3)  and  (4) 
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are  demonstrated,  then  by  induction  the  elements  in  matrix  V 
at  the  end  of  any  grouping  stage  aill  bo  the  loss  in  predict 
tive  efficiency,  over  all,  mhen  the  two  clusters  identified 
by  the  subscripts  are  combined. 

a.  Oonote  the  squared  multiple  for  clusters  i,  j,  and  k; 
cluster  i,j;  cluster  i,kj  cluster  j,k;  and  cluster  i,j,k  as 

R^,  Rj»  Rk»  Ri,j*  Ri,k*  Rj,k*  an<*  Ri,j,k*  Denote  the  updated 

values  of  the  weights  for  clusters  i,  j,  and  k  after  s  stages 
as  Wj,  Wj,  Wk,  and  Wf,«..Wg,  wb,  etc.  as  the  weights  asso¬ 
ciated  with  the  initial  group  indicated  by  the  subscript. 

b.  The  over-all  predictive  efficiency  after  s  stages  is 
(l/wHui^R^  +  W j R '•  ♦  WkR k  4  C),  where  C  is  the  weighted  sum  of 
squared  multiples  for  other  clusters. 

c.  The  over-all  predictive  efficiency,  if  at  the  next 
stage  clusters  i  and  j  were  combined,  would  be 

(l/^XC^l  ♦  W j ) R ^ ^ j  ♦  WkRk  ♦  C). 

d.  The  corresponding  loss  in  over-all  predictive  effi¬ 
ciency  is  (1/wMtt^Rj  ♦  W jR j  -  (Wi  ♦  wj)Rifj)* 

e.  By  analogy,  the  loss  in  over-all  predictive  efficiency, 
if  i  and  k  are  combined  after  stage  s,  is 

(l/w)(UliR^  4  WkRk  -  (“i  ♦  Wk)Ri,k). 

f.  By  analogy,  the  loss  in  over-all  predictive  efficiency, 
if  j  and  k  were  combined  after  stage  s,  is 

( l/w)  (WjR  j  *  «k"£  -  («J  ♦  «k>RJ,k>- 

g.  By  analogy,  the  loss  in  over-all  predictive  efficiency, 
if  at  stage  842  cluster  k  is  combined  with  the  i,j  cluster 
which  is  assumed  to  have  been  combined  at  stage  S4l,  is 

(1/»)((«1  .  Wj)Rf, j  ♦  wkRk  -  (“i  *  “j  ♦  <"k>Ri,j,k)- 

h.  Now  assume  that  element  v^  contains  the  quantity 
shown  in  step  d  after  s  stages,  v^k  contains  the  quantity 
in  step  a,  Vjk  contains  the  quantity  shown  in  step  f,  and 
that  v^j  is  the  minimum  in  the  updated  V  matrix.  Then  show 

that  the  updating  expression  v^  =  ( l/( W ^  ♦  Wk^vik^Ul  4  wk) 

4  v jk (Ul j  4  Ulk)  -  vijUlk) 

will  yield  the  value  of  the  quantity  shown  in  step  g. 
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i*  Substituting  thsir  assumed  values 

*  ( l/(  ■(Ul1  .  Ulj  4  .  W jR  j  4 

-  («(  *  '“k)2fii,k  -  (*j  *  1“k>2fiJ,k  * 


f0r  uik-  ujk-  ,nd  “ij- 

“k<“l  *  “j  *  *V"k 


J.  Oanote  aB^  the  vector  of  beta  weights  for  cluster  i 

after  s  stages,  8Bj  and  aBk  similarly;  aT^,  8Tj,  #Tk  are  vectors 

of  validity  coefficients  after  stage  s  for  clusters  i,  j,  and 
k  respectively. 


k .  Let 

s*l®i  j  be  tbe  vector  of  beta  weights  for  the  combined  i,j 

cluster,  if  clusters  i,j  were  combined  on  stage  s+1,  and  by 
proportionnlity  assumption  =  ( 1/ ( UJ ^  ♦  Wj))(W^*8B^  +  wj*sBj)* 

f  be  the  vector  of  beta  weights  for  the  combined  i,k 

cluster,  if  on  stage  s* 1  clusters  i  and  k  are  combined, 

*  (1/(1!  *  “kJJ-Ov.B,  ♦  V.BkX 

3*lBj,k  be  th®  v8ctor  beta  weights  for  the  combined  j,k 
cluster,  if  on  stage  s*l  clusters  J  and  k  are  combined, 

*  (l/(U,j  4  VXV.Bj  *  V,Bk>' 

s^20i  J  k  b®  thB  v8Ctor  °r  beta  weights  for  the  combi  ied  i , j ,  k 

cluster,  if  on  stage  s*l  clusters  i  and  j  are  combined  and  on 
stage  s*2  cluster  k  is  combined  with  the  i,j  cluster  formed  on 
State  s*l,  =  ( l/(  Ul±  ♦  Wj  ♦  Wk))(Wi‘8Bi  ♦  4  Wk«sBk); 

and  define  vectors  of  validity  coefficients  similarly. 


1. 


'i 

'j 

,2 


R  ^ 

Ri,J 


sBi 


®BJ 


sBk 


Then , 

Vi* 

Vj> 

.v 


s^lBi  j“s^l^i,j*  anb  substituting  the  s  stage  vectors 

for  the  s^l  stage  vectors  as  in  step  k,  =  (l/(W,  ♦  Ul.)^)(w  •  8.*  T 
_  1  J  i  s  l  s  i 


lii 


2. 


*B  J 


Vj  * 


“l'Y.BiVj  *  Ulittj-.Bj-.Ti); 
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R  i  ,  k  =  s*lBi,k*s*l^i,k  or  =  4  wk)  ^^i*8®i*8^i 

4  wk*sBk*8Tk  4  wiwk,9Bi*sTk  4  wiwk,9Bk*sTi) 5 

RJ,k  =  l^J.lcWj.k  °r  S  (1/{WJ  *  Wk)2)(Wj-9Bj-8Tj 
4  Wk,8Bk*«Tk  4  WjWk,8Bj*8Tk  4  WJWk*8Bk,8Tj)5  and 

Ri,J,k  *  •♦2®i,J,k*§*2Ti,J,k  O'  *  d/(Wi  ^  “j  ^  Wk)2)(Wi-.Bi*.Ti 

4  U|2,8Bj,8TJ  *  W^*8Bk*8Tk  4  WiWj*tBi,STj  4  WiWj*SBJ*STi 

♦  Ul  III  •  B  •  T  ♦  111  Ul  •  B  •  T  ♦  III  III  •  B  •  T  ♦  Ul  Ul  •  B  •  T  ) 

i  k  8°i  8  k  “i^k  8Dk  s  i  "j“k  sDj  s  k  “j"k  80k  a ' j ' • 

m.  Substituting  from  itap  1  the  expressions  for  R^,  R?, 

7  7  7  7  1  J 

Rk,  Ri,j*  Rl,k*  and  Rjfk  lnto  ixpreaeion  for  in 

atop  i,  v^  =  (l/fcCtfi*  Wj  ♦  Wk)))(Wkyu|*#B1*sTi/(W1  ♦  Wj) 

♦  *  “j>  •  »k<“l  *  “j>,V.Tk 

♦  ♦  "j>  *  *  “j> 

-  “i“k-,8i-.Tk  -  Vk-,9k-.Ti  -  Vk-.Bj*.Tk  -  W.V.y- 

n.  Substituting  from  stop  1  tha  axproaaions  for  R?  ., 

2  2  1  *  J 

Rk,  and  R i , j t k  into  tha  axpresaion  for  loss  in  over-all  pre¬ 
dictive  efficiency  in  atep  g  and  evaluating,  the  quantity  is 
identical  to  the  value  of  vk^  derived  in  step  m. 

o.  Therefore,  if  it  ia  assumed  that  the  updated  1/  matrix 
after  s  stages  contains  elements  which  are  tha  loss  in  over-all 
predictive  efficiency  which  would  be  required  at  stage  s+1  if 
the  two  clusters  indicated  by  the  row  and  column  subscripts 

of  an  element  were  combined,  then  the  expression  for  updating 
the  elements  in  the  column  and  row  in  which  the  minimum  is 
found  will,  in  fact,  give  the  loss  in  over-all  predictive 
efficiency  when  some  other  cluster,  k,  is  combined  on  stage 
9*2  with  the  combined  i,j  cluster  formed  at  stage  s*l. 

p.  To  prove  that  the  input  matrix  V  contains  elements 
which  are  the  lose  in  over-all  predictive  efficiency  when  a 

pair  of  initial  groups  are  combined,  let  A^  bo  the  squared 

l » J 

multiple  for  the  two  group  cluster  consisting  of  initial 
groups  i  and  j.  The  over-all  predictive  efficiency  for  the 
full  interaction  model  ie  (l/w)(w^r^  ♦  ^  j r  j  j  4  K)»  •,here  * 


* 
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is  the  weighted  sum  of  other  elements  on  the  diagonal  of 
matrix  R.  The  over-all  predictive  efficiency,  when  i  and  j 
are  combined  is 

( l/wX'.^i  ♦  «» j ) A  i  t  j  ♦  K) ; 

and  the  loss  in  over-all  predictive  efficiency  is 

I^Vii  *  Vjj  •  (“i  * 

The  vector  of  beta  weights  for  the  i,j  cluster  is,  because 
of  proportionality, 

( l/(  vu  i  ♦  Wj)  )(wiDi  ♦  uUjBj)  ; 

and  the  validity  vector  for  the  i,j  cluster  is 

(l/(w.  t  w^JCufjT.  4  w  Tj); 

hence , 

Ai.j  =  ^1^lui  4  uij)2)(w^B1Ti  4  w 2B jT j  4  wiWjBiTj  4  wiwjBJTi) 

or  =  (l/Cwj  4  wij)2)(w»^rii  4  w?Tjj  4  WjWjT^  4  wiu/jrji). 

Substituting  this  value  in  the  expression  for  the  loss  in 
over-all  predictive  efficiency, 

loss  =  (wiwj/(w(wi  4  Wj)))(rii  ♦  r  j  j  -  r^  -  r^), 

which  is  also  the  expression  for  the  element  v^  in  the  input 
matrix  (see  paragraph  3,  item  h), 

q.  Note:  If  all  the  initial  group  n's  are  equal,  the 
initial  values  of  w^  could  either  all  be  set  equal  to  this 

common  value  or  the  w^  all  set  equal  to  1  initially.  In 

the  latter  case,  the  UJ^,  etc.,  would  be  just  the  number  of 

initial  groups  currently  contained  in  cluster  i  and  w  would 
be  the  number  of  initial  groups.  The  expressions  fur  the 
vectors  of  beta  weights  and  validity  coefficients  would 
still  be  correct. 

5.  Proof  that  option  4  groups  so  as  to  minimize  the  average 
pairwise  loss  in  predictive  efficiency,  given  an  input  matrix 
V  and  the  expression  for  updating  elements  in  the  row  and 
column  in  which  the  minimum  of  the  updated  V  matrix  is  located, 
vki  =  (vikWi  4  vjkwj)/(wi  4  wj)»  *here  is  the  minimum  in  V . 

Definition:  A  pairwise  loss  in  predictive  efficiency  is  the 

reduction  in  R*  when  an  interaction  model  for  only  two  of  the 
initial  groups  is  reduced  to  a  common  model  for  the  two  groups. 


V***  A 


B 


If  at  tha  #*d  of  a  aUgas  thara  ara  lil^  of  the  Initial  groups 

in  cluster  i  and  ik  of  tha  initial  groupe  in  cluetar  k ,  than 

tha  average  lose  being  considered  is  the  average  of  W^»Wk 

losses,  where  each  loaa  reflects  tha  combining  of  one  member 
of  cluster  i  with  one  member  of  cluster  k.  Method:  (1) 
Assume  that  at  tha  end  of  a  stages  the  V  matrix  has  been 
updated  so  -that  vjj  contains  2/e  times  the  average  paireise 

loss  ehich  mill  occur  on  stage  s*l  if  cluaters  i  and  j  are 
combined,  centaine  2/m  times  the  average  paireiae  loaa 

ehich  mill  occur  on  atage  s+1  if  clustera  i  and  k  are  com¬ 
bined,  and  contains  2/e  times  the  average  paireise  loss 

ehich  eill  occur  on  stags  s*l  if  clusters  j  and  k  are  com¬ 
bined;  (2)  Assume  that  v^j  is  the  minimum  in  tha  updated  V; 

(3)  Then  shoe  that  vkj  eill  be  2/e  times  the  average  pair¬ 
eise  loss  ehich  eill  occur  on  stage  s+2  if  cluster  k  is  com¬ 
bined  eith  the  nee  cluster  called  i  ehich  eas  formed  at 
stage  s+lj  (4)  Then  shoe  that  an  element  of  the  input  matrix 
is  2/e  times  the  loss  for  a  pair  of  initial  groups;  (5)  So, 
by  Induction,  all  elements  of  the  updated  V  matrix  eill  con¬ 
tain  2/e  times  the  average  paireise  loss  ehen  all  possible 
pairs  are  formed  by  putting  members  of  the  cluster  identified 
by  the  roe  subscript  eith  members  of  the  cluster  identified 
by  the  column  subscript  of  the  element  of  the  updated  V 
matrix. 


a.  3y  assumption  (1)  under  "Hethod, "  there  are  Wj»Wk 
losses  involved  in  vik;  hence,  Wj *Wk(e/2) vik  is  the  sum  of 
Uli.UJk  losses  ehen  members  of  cluster  i  are  paired  with  mem¬ 
bers  of  cluster  k, 

b.  Similarly,  Ulj  •Ulk(w/2)v  is  the  sum  of  wj*wk  losses 

when  members  of  cluster  j  are  paired  with  members  of  cluster 
k . 


c.  The  total  number  of  losses  to  be  considered  whan  mem¬ 
bers  of  cluster  i  are  paired  with  members  of  cluster  k  and 
^embers  of  cluster  j  are  paired  with  members  of  cluster  k  is 

*  V“k  *  “k<“l  ♦  “j)- 

d.  Therefor,,  (ttll-Wk(,/2)vlk  .  M jltfk( «/2 ) v Jk )/( Wk ( U,  .  Ulj ) ) 

=  (w/2)(vlkUli  ♦  vjkwj)/(wi  ♦  wj)  or  *  («/2) v£ is  the  average 

oairwise  loss  when  members  of  cluster  i  are  paired  with  mem¬ 
bers  of  cluster  k  and  members  of  cluster  j  are  paired  with 
members  of  cluster  k. 
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e.  Hence,  v^  =  (2/w)  times  the  average  lose  which  would 

occur  on  stage  s*2  if  clusters  i  and  j  existing  at  the  end  of 
stage  s  are  combined  on  stage  s+1. 


f.  The  two-group  interaction  model  squared  multiple  for 
initial  groups  i  and  j  with  equal  n's  is  (l/2)(r^  ♦  rjj)* 

g.  The  vector  of  beta  weiqhts  for  the  combination  of 
initial  groups  i  and  j  is  (l/2)(Bj  +  B j ) ;  the  vector  of 

validity  coefficients  is  (l/2)(T,  ♦  T.);  the  squared  multiple 
is  then  J 

( l/4 ) ( B jT|  ♦  B  j  T  j  ♦  Bi T  j  ♦  B  j  T  ^ )  =  (l/4)(rA1  ♦  r  j  j  ♦  r  ±  j  ♦  r  j  A ) 

h.  The  loss  for  pairing  groups  i  and  j  is 

(l/^)(ril  ♦  r j j  -  r i j  -  r j ± )  =  (w/2)vij 

(see  paragraph  3h). 


i.  Hence,  the  element  v.^  of  the  input  matrix  \l  is  (2/w) 
times  the  pairwise  loss  for  initial  groups  i  and  j. 


j.  Since  steps  a  through  e  do  not  use  the  assumption  of 
proportional  sums  of  squares  and  cross-product  predictor 
matrices  and  of  common  criterion  means  and  s.d.'s,  option  4 
can  be  shown  to  minimize  average  loss  in  pairwise  R2  if 
some  other  input  matrix  V  can  be  constructed  which  reflects 
loss  in  pairwise  R2  with  an  expression  using  the  elements 
of  matrix  R  and  other  information  so  as  not  to  involve  these 
assumptions.  However,  initial  values  must  be  set  at  1 

even  if  another  input  matrix  \l  is  used. 
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Top  Tasks  from  Six  Medical  Laboratory 
Technician  Job  Types 


Bio  Chemistry  Job  Type 

Perform  Liver  Function  Tests 

Perform  NPN  and  BUN  Tests 

Operate  Spectro  Photometer 

Perform  Calcium  and  Phosphorus  Tests 

Perform  Total  Protein  and  A  G  Ratio 

Total  Cholesterol  and  Esters  Test 

Utilize  Colormetric  Procedure 

Perform  URIC  Acid  Tests 

Perform  Carbon  Dioxide  Determinations 

Perform  Enzyme  Analyses 

Perform  Chlorides  Tests 

Prepare  Reagents  and  Standards 

Perform  Electrolyte  Determinations 

Collect  Blood  Specimens  Directly  from  Patients 

Perform  Carbohydrates  Tolerance  Tests 

Operate  Flame  Photometer 

Perform  Creatinine  Tests 

Prepare  Reagents 

Perform  Prothrombin  Time  Test 

Prepare  Solutions  and  Standards 

Clean  Area  Equipment  Aseptically 

Separate  Serum  from  Blood 

Prepare  and  Pro  ess  Specimens 

Centrifuge  and  Separate  Serum  from  Clot 

Utilize  Titriraetric  Procedure 

Blood  Bank  Job  Type 

Crossmatch  Blood 

Test  Blood  for  ABO  Grouping  and  ABO  Subgrouping 

Type  Blood  of  Donors  and  Receipients 

Test  Blood  for  RHO  or  DU  Factors 

Store  Blood  According  to  Grouping  and  Factor 

Centrifuge  and  Separate  Serum  from  Clot 

Prepare  Blood  for  Shipment 

Maintain  Files  of  Blood  Banking  Forms 

Perform  Direct  and  Indirect  Coombs  Tests 

Record  Information  on  Blood  Record  Card 

Prepare  and  Process  Specimens 

Heterophile  Presumptive  and  Differential  Antibody  Test 
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Blood  Bank  Job  Type  (Cor.t'd) 


Collect  Blood  Specimens  Directly  from  ratierits 

Dispose  cf  Blood  after  Time  Limit 

Perform  Cardiolipin  Microflocculation 

Perform  C  Reactive  Protein  Tests 

Perform  Latex  Fixation  Test 

Log  Incoming  or  Outgoing  Specimens 

Draw  Blood  for  Transfusions 

Maintain  Donor  Files 

Process  Blood  for  Packed  Cells 

Hematology  Supervisor  Job  Type 

Perform  Hematrocrit  Tests 
Perform  Blood  Count 
Prepare  Blood  Smears 

Perform  Erythrocyte  Sedimentation  Hate 

Identify  Morphological  Variation:;  'if  Blood  Cells 

Perform  Reticulocyte  Count 

Perform  Sickle  Cell  Preparations 

Separate  Serum  from  Blood 

Identify  Immature  Blood  Cells 

Perform  Eosinophile  Counts 

Determine  Coagulation  Times  by  Lee  White  Method 
Perform  Spinal  Fluid  Cell  Counts 
Requisition  Supplies  and  Equipment 
Perform  Thrombocyte  Count 

Determine  Coagulation  Times  by  Capillary  Method 
Determine  Bleeding  Time  Ivy  Method 
Collect  Blood  Specimens  Directly  from  Patients 
Perform  Differential  Cell  Count? 

Perfor  ;  Clot  Retraction  Test 
Determine  Bleeding  Time  Duke  Method 
Perform  Cerebrospinal  Fluid  Count 
Perform  Erythrocyte  Indices 

Bacteriology  Job  Type 

Prepare  Culture  Media 

Clean  Area  and  Equipment  Aseptic? lly 

Identify  and  Classify  Pathogenic  3acteria 

Perform  Antibiotic  Sensitivity  Test 

Stain  Bacteriological  Smears 

Examine  Specimens  Microscopically 
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Bacteriology  Job  Type  (Cont'd) 


Identify  Protozoans  Cestodes  Nematodes  or  Trematodes 

Collect  Skin  Specimens  Directly  from  Patients 

Perform  Concentration  and  Flotation  Techniques 

Collect  Pus  Specimens  Directly  from  Patients 

Log  Incoming  or  Outgoing  Specimens 

Perform  Bacteriological  or  Chemical  Exam  of  Water 

Collect  Fecal  or  Urine  Specimens  from  Patients 

Stain  Mycology  Specimens 

Stain  Parasitological  Smears 

Prepare  Solutions  and  Standards 

Maintain  Files  of  Laboratory  Records  and  Reports 

Investigate  Possible  Sources  of  Staphylococcus  Outbreaks 

Perform  Sperm  Counts 

Cultivate  Mycology  Specimens  for  Primary  Isolation 
Perform  KOH  Preparation  for  Dermatophytes 
Identify  and  Classify  Fungi 

Collect  Sputum  Specimens  Directly  from  Patients 
Histopathalcgy  Technician  Job  Type 
Section  Tissue  in  Microscopic  Blocks 

Mount  Tissue  Section  in  Preparation  for  Microscopic  Study 

Bribed  Tissue  in  Paraffin 

Stain  Specimens  for  Microscopic  Study 

Prepare  Routine  Stains 

Prepare  Tissue  for  Dehydration  and  Infiltration  of  Paraffin 

Assist  with  Autopuy 

Prepare  Special  Stains 

Log  Incoming  or  Outgoing  Specimens 

Use  Autotechnicon 

Prepare  and  Process  Specimens 

Decalcify  Specimens  of  Teeth  and  Bone 

Prepare  Specimens  for  Shipment 

Submit  Tissue  Specimens  to  AFIP  or  Histopathology  Centers 

Prepare  Frozen  Section  of  Tissue 

Use  Microtome  Knife  Sharpener 

Clean  Area  and  Equipment  Aseptically 

Collect  Biopsy  or  Autopsy  Specimens 

NCOIC  Job  Type 

Evaluate  Work  Performance  of  Subordinates 
Resolve  Technical  Problems  of  Subordinates 
Assure  the  Availability  of  Equipment  and  Supplies 


NCOIC  Job  Type  (Cont*d) 


Assign  Specific  Work  to  Individuals 

Evaluate  the  Accuracy  of  Routine  Reports 

Develop  and  Improve  Work  Methods  and  Procedures 

Plan  Reports  for  the  Section 

Plan  and  Schedule  Work  Asslgnaents 

Direct  Malnt  Utilzn  of  Equip  Supplies  and  Work  Space 

Determine  Equipment  Repairs  of  Replacements  Needed 

Evaluate  Compliance  vlth  Established  Work  Standards 

Supervise  On-the-Job  Training  Programs 

Evaluate  the  Adequacy  of  Routing  Reports 

Coordinate  Work  Activities  vlth  other  Sections 

Establish  Work  Priorities 

Show  How  Locate  and  Interp  Technical  Information 
Assist  Officer  in  Charge  Estab  Organizational  Policy 
Evaluate  Individuals  for  Promotions  and  Upgrading 
Recomnend  Special  Corrective  Action  for  Recurring  Problems 
Rotate  Duty  Assignments  of  Personnel 
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Figure  1.  90U50  Medical  Laboratory  Technician 
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Figure  2.  Data  Illustrating  Individual  Job  Descriptions 
Common  Worktime,  Consolidated  Job  Description,  and 
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FOOTNOTE 

The  "HAXOF  Clutter  Model"  hat  been  available  and  utilized  for  more 
than  five  years.  However,  this  la  the  flrat  time  that  the  model  has  been 
given  a  name.  It  hat  on  occaalon  been  referred  to  as  "The  Peraonnel 
Research  Laboratory  .Hierarchical  Grouping  Model."  In  other  instances, 
application!  of  the  model  have  been  given  a  title,  auch  at  "The  PRL 
Job-Type  Analysla  Program"  or  "Tht  Iterative  Criterion  Clustering 
Program."  None  of  these  titles  it  descriptive  or  easy  to  remember. 
Hopefully,  future  papers  will  be  consistent  in  applying  the  name  given 
in  this  paper,  so  as  to  avoid  further  confusion  of  the  readers. 
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THE  MULTIVARIATE  ANALYSIS  OF  QUALITATIVE  DATA 


Jute  a  C.  Lingoes 
The  University  of  Michigan^ 


1.2 


Although  s  large  number  of  linear  techniques  have  been  proposed  for  the  mul- 
U variate  treatment  of  quantitative  data  (Ball,  196b),  little  has  been  advanced 
for  the  multidimensional  analysis  of  nominal  data.  Indeed,  in  some  quarters 
(Torgerson,  195&),  nominal  or  classlflcatory  variables  do  not  merit  the  status 
of  scales  and  are,  therefore,  not  deserving  of  any  serious  consideration  in  a 
book  on  scaling  theory  and  methods. 

When  an  investigator  is  confronted  with  categorical  variables  in  the  con¬ 
text  of  their  more  respectable  brethren,  quantitative  variables,  and  he  is,  never¬ 
theless,  determined  to  analyze  them,  typically  he  resorts  to  constructing  "dussny 
variables"  from  the  various  categories  before  proceeding  with  a  standard  linear 
analysis.  Alternatively,  the  researcher  may  eliminate  nominal  variables  from  the 
analysis  proper  (save,  perhaps,  for  that  omnipresent  dichotomy  of  sex),  resting 
content  to  later  use  the  categorical  data  in  a  descriptive  capacity  for  talking 
about  his  results  and  interpretations.  Each  of  theae  strategies  has  its  associa¬ 
ted  dangers.  In  brief,  for  the  former  choice,  an  element  of  artificiality  is  in¬ 
troduced  along  with  all  the  problems  attendant  upon  having  uneven  or  extreme  mar¬ 
ginal  distributions  -  factors  which  may  cause  difficulties  in  interpretation  and 
result  in  a  loss  of  parsimony.  In  the  latter  choice,  a  very  real  risk  may  be  in¬ 
curred  in  assigning  appropriate  weights  to  the  qualitative  data  because  of  their 
univariate  treatment  and,  as  a  consequence,  some  Important  cues  may  be  lost  for 
refinement  of  research  design. 

This  paper  will  be  concerned  with  presenting  the  rationale  and  details  of 
three  possible  approaches  to  the  multivariate  analysis  of  nominal  data.  The 

1.  An  invited  paper  presented  at  the  Conference  on  Cluster  Analysis  of  Multivariate 
Data,  held  in  New  Orleans,  Louisiana  on  12/9-11/66. 

2.  This  research  in  nonmetric  methods  is  supported  in  part  by  a  grant  from  the 
National  Science  Foundation  (GS-929)* 

3.  Prepared  while  on  leave  to  The  University  of  California,  Berkeley. 
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three  procedures  to  be  discussed  in  turn  ere:  1)  Multivariate  Analysis  of  Contin¬ 
gencies  -  II  (Outtman,  19^1;  Lingoes,  1963b;  196U  );  p)  Multidimensional  Scalo- 
gram  Analysis  -  I  (Guttman*4;  Lingoes,  1966a);  and,  3)  Multidimensional  Scalogram 
Analysis  -  II  (Guttman,  1967  ;  Lingoes,  1967c),  which,  for  brevity,  are  labeled: 
MAC- 1 1 ,  MSA-I,  and  MSA-II,  respectively. 


Multivariate  Analysis  of  Contingencies 

The  general  problem  of  multidimensional  analysis  is  concerned  with  the  i.hree 
basic  facets  of  persons  (P),  variables  (j),  and  categories  (Cj ) ,  the  subscript  on 
C  always  being  implied,  when  not  expressly  written,  to  denote  that  a  category  be¬ 
longs  to  an  item,  l.e.,  it  has  no  independent  status.  Given  these  sets,  our  taslc 
is  one  of  mapping  P  into  Cj  0  C  j)  or,  symbolically,  P  — •  C^.  The  characteris¬ 
tic  function  of  the  three  sets  ia: 


1,  if  p  — «•  c  for  j 
0,  otherwise 

The  binary  matrix  defined  by  l)  is  called  the  attribute  or  trait  matrix  E,  repre¬ 
senting  the  general  model  for  both  quantitative  and  qualitative  data.  Under  the 
condition  of  having  mutually  exclusive  and  exhaustive  categories  (which  can  always 
be  effected  by  a  proper  choice  or  definition): 


1) 


'pjc 


2) 

3) 

*0 


Of  J;  ptP); 


n,  the  number  of  variables  in  set  J  (pCP); 


M,  the  number  of  persons  in  set  P  ( J  C  J ) ; 


k.  Guttman,  L.  Unpublished  lectures  on  multidimensional  analysis  given  at  The 
University  of  Michigan,  1965 .  Much  in  the  above  and  following  discussion  is  based 
upon  these  lectures.  Some  minor  changes  in  notation  and  terminology  have  been 
made  to  conform  with  a  prior  presentation  (Lingoes,  1963b). 


A 

i 

t 


I 


'-linvonv 


5)  )  1  =  k.,  the  number  of  categories  in  C.  (j  *  J;  p*  P ) ; 

etc  j  J  ■' 


M  Z 


=  nc,  the  number  of  categories  over  all  Items;  and 


7)  ep^c  «  n^,  or  the  number  of  persona  in  category  c  of  item  .1. 

fTT  =  n.c/N  =  £  ep1c»  or  the  Probability  °f  variable  ,1  falling  in  cate- 

-  JC  p*P  ‘ 

vory  r  =  the  relative  frequency  «  the  expected  value  of  ®pjc  f°r  -he  Population  P.J 
A  universal  property  of  the  characteristic  function  is  that  any  scoring 
scheme  lor  persons  can  be  formulated  in  terms  of  the  product  of  a  set  of  weights 
and  the  elements  of  E: 


*>  £5  W1<'p.1e  *  "p  (PlP>- 

J 

An  example  of  a  simple  scoring  system  would  be  a  vector  of  Is  and  Of?  for  dichoto¬ 
mous  items,  e.g.,  the  number  of  correct  answers.  A  more  complies* seeding  sys¬ 
tem  might  make  adjustments  for  guessing,  etc.  In  all  cases,  however,  Individuals 
are  placed  into  score  classes  such  that  one  person  or  group  of  persons  is  distin¬ 
guishable  from  another  person  or  group.  The  scoring  problem  can  thus  be  seen  as 
being  equivalent  to  that  of  determining  the  partitions  of  P  under  specified  con¬ 
straints.  What  aspecta  of  E  are  we  interested  in  classifying?  The  answer  to 
this  question  will  specify  the  kinds  of  constraints  needed  for  finding  a  solution 
to  the  unknown  weights  and  scores.  Some  may  be  interested  in  the  principal  com¬ 
ponents  of  scales  (Guttman,  1950);  others  in  scale  homogeneity  (Dempsey,  196 3); 
some  in  optimizing  discriminant  functions  (Bryan,  1961);  yet  others  in  reducing 
dimensionality  within  the  framework  of  common  factor  theory  (Butler,  et  al,  1963 ); 
and  others  in  maximizing  linearity  as  a  basis  for  both  typing  objects  and  determin¬ 
ing  a  smallest  space  nonmetric  solution  for  variables  (Lingoes,  1963b;  1964;  1966b) 
Arain,  other  interests,  as  in  the  MSA-I  and  MSA-II  approaches,  will  suggest  alterna 
tive  restrictions  on  the  solutions  to  the  partitioning  problem.  All,  however,  are 
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baaed  upon  S  and  all,  with  the  exception  of  MSA-II,  use  the  basic  theory  and 
equations  worked  out  by  Outtman  (1941)  a  quarter  of  a  century  ago. 


The  KAC-II  Basic  Equations 

Matrix  notation  will  he  used  for  outlining  the  initial  steps  of  the  MAC-II 
algorithm. 


STEP  1:  Define  an  nc-order  diagonal  matrix  F,  whose  fii  elements  *  the  num¬ 
ber  of  Ss  falling  in  the  i^h  category,  i.e.,  n^c.  If  there  are  n  variables  and 
N  Sa  then  tr(F)  =  nil. 

STEP  2:  Form  the  matrix  product  F“^E  *  G,  where  E  and  0  are  ncxN  order 
matrices. 

STEP  3:  Compute  G'G  =  M,  where  M  is  an  R-square  Gramian  matrix  with  typical 
element : 


Vl  =  ZUZ  (p,q  C  P;  04mpq<n), 

Fj  cfCj  njc 


or  the  number  of  categories  that  p  and  q  share  weighted  inversely  by  the  number 
of  persons  falling  in  the  shared  categories.  Tr(M)  -  nc  and  the  maximum  rank  of 
M,  ^>(M)  ■  nc-n+l  (for  R*n). 

STEP  4:  Solve  the  eigenequation:  (M  -  Al)S  »  0,  where  I  is  an  order  N 
identity  matrix  and  A  and  S,  respectively,  are  the  roots  and  vectors  satisfying 
the  equation.  The  largest  root  of  the  solution  ■  n  and  the  elements  of  its 
associated  vector  of  unit  length  *  (l/lO*,  a  constant,  representing  the  substan¬ 
tively  trivial  but  formally  important  solution  corresponding  to  removing  "chance 
expectation"  as  in  chioquare  analysis.  Indeed,  what  we  are  solving  for  are  the 
orthogonal  components  of  chisquare  and  the  resultant  metric  is  that  of  chisquare. 
The  mean  score  for  each  independent  score  vector  =  0  as  a  consequence  of  the  con- 
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stant  solution. 

Although  this  paper  will  not  be  concerned  with  determining  optimal  category 
weights  (the  mean  of  the  distribution  of  scores  for  persons  falling  in  various 
categories),  which  serve  as  the  basis  for  the  nomnetric  factor  analysis  of  var¬ 
iables  by  S3A-III  (Lingoes,  1966b;  Lingoes  &  Guttman,  in  preparation)  in  the 
later  part  of  the  MAC-II  program,  the  significance  of  the  roots  should  he  com¬ 
mented  upon.  If  we  order  the  roots  of  M  (excluding  A as  follows: 

•••) Xp  ,  then  the  total  chisquare  can  be  obtained  from: 

9>  *  f  t  (^j-1)2  ,  having  ^(H-l)  or  (nc-n)(lf-l)  degieer  of 

freedom.  On  the  other  hand,  the  ^  partition  of  chisquare: 

10)  and  has  [(f-l)  +  (Ji-1 )  -  ^(j-lJJ  degrees  of  free¬ 
dom.  Corresponding  to  each  root,  however,  there  ia  a  correlation  ratio: 

11)  "*\j  «=  (Aj-l)/(n-l),  which  varies  between  0  and  1  and  measures  the 

covariance  among  variables.  If  all  bivariate  regressions  are  linear  or  can  be 
made  linear  by  finding  an  optimal  set  of  scores  (in  the  least  squares  sense  of 
MAC-n),  then  the  correlation  ratio  will  equal  the  average  intercorrelation 
among  the  variables.  This  fact,  of  course,  is  exploited  in  the  MAC  series  for 
linearizing  (lingoes,  1964)  relationships  among  quantitative  variables,  e.g., 
when  nonlinear  relationships  may  be  present. 

By  introducing  the  concept  of  statistical  significance  we  are  afforded  a 
rationale  for  attending  to  but  a  subset  of  the f  vectors  for  further  analysis.  Let 
m  *  the  number  of  significant  roots  in  our  solution.  Each  person  then  can  be 
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plotted  In  a  m-dimensional  Euclidean  space  and  our  problem  reduces  to  that  of  de¬ 
termining  the  further  partitions  of  this  subapace  in  terms  of  nalient  clusters. 
Although  a  number  of  techniques  could  be  used  for  clustering,  once  the  proper  sub- 
apace  has  been  determined,  a  hierarchical  clustering  method,  max -min  cluster  anal¬ 
ysis  (Lingoes,  1963b),  based  upon  a  perceptual  and  statistical  model  was  used  and 
-C  c escribed  below. 

Max-mln  Clustering 

STEP  1:  Normalize  the  unit  length  vectors  of  scores  to  the  length  of  their 
associated  roots,  i.e.,  form  =  Xj  where  H  is  the  m-square  diagonal  matrix  of 
etas  and  S  and  X  arc  the  order  Hxm  matrices  of  unit  length  score  vectors  and  nor¬ 
malized  scores,  respectively. 

STEP  2:  Calculate  the  Tf-square  matrix  of  Euclidean  distances  among  the 
N(N-l)/2  pairs  of  persona  according  to  the  standard  distance  formula: 

^2)  <*pq  “  |V  '  (*pi  -  *qi  .  •  •  »N-lj  q»*p+l,p+2, . , .  ,N). 

i«1 

STEP  3:  Compute  3  and  from  the  off-diagonal  elements  of  the  distance 
matrix  D.  Set  Level,  1  =  0;  compute  A  »  »d/8;  and  set  Radius,  r ^  *  sd/2  -  A. 

STEP  *+.  Set  I  ■  t+1  and  =  r^"*)  +  A.  If  Jt  »  14-,  or  if  1  ■  6  and  less 
than  in  has  been  clustered,  or  if  the  number  of  clusters  is  less  than  3»  terminate 
clustering.  Otherwise  proceed  to  next  step. 

STEP  5:  Sitting  on  each  point  in  turn  in  the  m-diaensional  space  determine 
the  number  of  points  which  art  within  the  current  radius  criterion  for  each.  Se¬ 
lect  that  point  accounting  for  the  most  points  within  the  radius  as  a  new  cluster 
(breaking  any  tieB  in  favor  of  that  point  havine  the  smallest  sum  of  squared  dis¬ 
tances  between  the  centroid  of  the  cluster  and  those  points  within  its  orbit). 
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If  any  point  represents  a  cluatcr  of  persons,  then  the  number  of  individuals  in 
that  cluster  are  considered  whenever  a  new  cluster  is  to  be  formed . 

Excluaing  points  already  classified  at  a  fixed  level,  determine  that  point 
which  accounts  lor  the  next  largest  number  of  individuals,  iteratively,  until  no 
further  clusters  can  be  formed  at  the  given  criterion.  Compute  the  centroids  of 
all  clusters  formed  within  the  radius  of  inclusion  and  the  reduced  matrix  of  in¬ 
terpoint  distances.  Go  to  STEP  4. 

Some  useful  statistics  that  might  be  calculated  at  each  level  are:  1)  the 
mean  and  standard  deviation  of  the  off-diagonal  distances;  and  2)  for  each 
cluster  comprising  four  or  more  individuals:  a)  the  mean  and  variance  of  the 
distances  from  the  origin  for  constituents;  b)  the  interpoint  distances  be¬ 
tween  all  such  pairs  of  clusters;  and  c)  t-tests  (based  upon  pooled  estimates 
of  variance)  between  pairs  of  clusters. 

The  above  rather  simple  clustering  procedure  results  in  a  tree  where  Ss  are 
classified  in  only  one  cluster  at  a  given  level  and  are  never  reclassified  on  the 
basis  that  the  cluster  might  thereby  be  improved.  Based  upon  the  statistics  com¬ 
puted  for  each  level  in  conjunction  with  extra-statistical  considerations  (e.g., 
number  of  clusters  desired,  number  of  Gs  unclassified,  meaningfulness ,  etc.),  the 
investigator  is  free  to  select  that  level  of  parti  Tiing  most  appropriate  to  his 
purposes.  Indeed,  the  hierarchical  approach  has  as  <.c  of  its  chief  virtues  this 
kind  of  freedom  of  selection.  Among  the  statistics  calculated  at  each  level, 
the  d  and  sd  of  the  interpoint  distances  have  often  proved  helpful  as  guides. 
Generally,  as  radius  increases  so  does  mean  distance  among  points,  but  not  uni- 
iormally  nor  for  all  problems  at  the  same  level.  On  the  other  hand,  the  standard 
deviation  does  not  behave  as  consistently  as  a  function  of  level.  For  most  pur¬ 
poses  choosing  that  level  for  which  the  coefficient  of  variation  is  a  minimum  has 
proven  optimal,  i.e.,  V  =  lOOs^/d. 
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In  sunnary,  starting  with  a  completely  general  binary  matrix  E,  representing 
a  subject  by  category  classification,  for  any  set  of  categories  (quantitative  and/ 
or  qualitative),  a  solution  for  a  set  of  real  numbers  (chisquare  metric)  is  sought 
to  replace  the  arbitrary  category  captions  such  that  the  covariance  matrix  among 
variables  is  maximized.  We  impose  the  restrictions  that  each  scoring  system  be 
orthogonal  to  every  other,  each  successively  accounts  for  the  maximum  amount  of 
remaining  variance,  and  the  number  of  such  scoring  systems  be  a  minimum.  Since 
we  are  not  interested  in  exactly  reproducing  E,  something  less  than  a  full  cet 
of  vectors  are  required.  To  the  extent  that  sampling  and  measurement  errors  may 
htve  entered  into  the  determination  of  E,  we  have  invoked  the  statistical  concept 
of  significance  for  selecting  the  appropriate  subset  of  solutions.  Finally,  as  a 
way  of  looking  at  and  organizing  the  configuration  of  points  in  m-dimenBional 
space,  we  have  introduced  a  clustering  procedure  whereby  each  point  appears  within 
a  set  of  spherical  envelopes  of  varying  radii  such  that  mutually  exclusive  sets  of 
these  spheres  define  a  typology  and  give  us  a  feel  for  the  distribution  of  points 
free  of  considerations  in  respect  to  the  origin,  rotation,  or  orientation  of  the 
principal  axes.  Thus,  three  kinds  of  partitioning  are  involved  in  MAC-II:  1)  at 
the  point  where  the  number  and  kinds  of  categories  have  been  decided  upon  (mainly 
a  psychological  problem),  2)  at  the  level  where  the  category  captions  have  been 
replaced  by  a  set  of  optimal  weights  (a  problem  of  numerical  analysis),  and  3)  at 
the  level  where  a  subspace  is  defined  and  clusters  are  sought  (a  problem  involving 
statistical,  perceptual,  and  psychological  considerations).  The  first  and  third 
partitions  involve  coaiponents  of  subjectivity  and  arbitrariness,  while  the  second 
is  completely  objective  and  results  in  a  unique  solution  (given  the  initial  set  of 
categories  and  the  function  that  is  being  maximized). 

Although  most  applications  of  MAC  have  been  restricted  to  quantitative  data, 
some  have  involved  combinations  of  quantitative  and  qualitative  variables  and  some 
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predominantly  qualitative  data  (McFherron,  1963)>  yielding  most  interesting  results. 

« 

More  studies,  however,  are  required  to  properly  assess  the  potentials  and  limita- 
tions  of  this  technique  for  the  multidimensional  treatment  of  categorical  data. 

A  redefinition  of  our  goals  and  the  kinds  of  constraints  imposed  suggests 
yet  another  way  of  looking  at  the  basic  data  matrix  E. 

Multidimensional  Scalograsi  Analysis  -  I 
The  ecsential  task  set  for  MSA-I  is  that  given  the  If  points  embedded  in  a 
subspace  defined  by  the  m  largest  vectors  of  X  (the  normalized  score  vectors), 
can  we  transform  the  coordinates  such  that  for  a  fixed  item  all  individuals 
falling  within  a  given  category  will  be  placed  in  a  contiguous  region  of  that 
space?  We  are  thus  seeking  a  definition  of  category  boundaries  yielding  regions 
of  indefinite  contours  (the  nature  of  the  boundaries  are  not  specified)  where 
erch  item  represents  a  partitioning  of  the  space.  In  order  to  solve  this  prob¬ 
lem  we  need  to  specify  how  the  boundaries  are  to  be  determined  such  that  con¬ 
tiguous  regions  are  insured  and,  further,  what  is  the  nature  of  the  loss  function 
tp  be  minimized,  i.e.,  how  are  we  to  evaluate  noise? 

Consider  a  given  partition  J  C  J  of  the  m-dimensional  subspace  defined  by  X, 
the  points  falling  within  a  specified  category  (c  C  Cj,  J  €  j)  will  not,  in  gener¬ 
al,  fall  within  a  region  all  of  whose  members  belong  to  that  category.  For  each 
point  not  belonging  to  c,  however,  say,  b  (  Cy  there  is  a  closest  point  that  does 
belong  to  the  category  c;  such  a  closest  point  is  defined  as  a  trial  "outer-point" 
of  category  c.  For  each  person-point  in  turn  we  can  define  a  set  of  outer-points. 
Further,  all  points  not  classifiable  as  outer-points  will  be  considered  as  "inner- 
points".  Now,  the  set  of  points  falling  in  a  fixed  category,  outer-  and  inner- 
points  alike,  are  defined  as  being  contiguous  iff  each  (if  any)  inner-point  is 
closer  to  some  outer-point  of  the  same  category  than  it  is  to  any  outer-point  of 
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of  any  alternative  category  of  the  flame  item.  More  formally,  given  item  ,1  t  J 

r\ 

and  c  t  Cj  and  any  three  point*  p,  q,  r  l  P,  if  epjceqjc(l-er^c)  =  1  and  d  pr 

p 

4  a  qr,  then  *  1  (p  i*  an  outer-point  of  c  for  j),  otherwise  c  0  (p 

is  an  inner-point  of  c  for  j),  where  d^  is  the  squared  Euclidean  distance 
between  the  two  points,  p  and  r.  The  remaining  algebra  is  concerned  with  an 
explicit  statement  of  the  function  to  be  maximized  (Outtman's  coefficient  of 
contiguity.  A),  how  the  coordinates  are  to  be  modified  in  order  that  the  fore¬ 
going  function  is  maximized,  and  how  we  are  to  control  or  modulate  the  conver¬ 
gence  process.  The  MSA-I  progriua  (Lingoes,  1966a)  is  completely  adequate  for 
the  analysis  of  quantitative  as  well  as  qualitative  data,  dichotomous  as  well 
as  n-chotomous  variables,  monotone  and/or  polytone  items,  and  involves  no  assump¬ 
tions  whatsoever  about  scaling  properties  or  distributions. 


The  MSA-I  Basic  Equations 

STEP  1:  Determine  what  outer-point  of  its  own  category  is  p  as  an  inner- 

poin  closest  to,  i.e.,  If  Vqj“r  j  -  1  and  d2pq<d2pr  ,  then  =  1 

and  otherwise  =  0.  If  •*_.  =  1,  then  0  =  0  for  all  q. 

PJ  ’  pqj 

STEP  2:  Determine  what  outer-point  of  another  category  is  p  as  an  inner- 
point  closest  to,  i.e.,  if  *pjcU-“p,i  jc >".) (l-cr.1c ’  1  and  ^pq^pr, 

then  Jfpqj  =  1,  otherwise  0. 

STEP  3:  Where  £pqj  is  a  n  element  column  vector  and  ypqj  is  a  n  element  row 
vector  compute:  £pqr  =  5-^pqjKprj  • 

J-1 


STEP  4:  Compute  the  sign  matrix:  S, 


=  sgn(d2  -  d^  )  ■  -S 
pqr  K  v  pr  pq'  prq 


STEP  5:  Calculate:  £*  «  S  •  6  ,  i.e.,  modify  £  according  to 

pqr  pqr  pqr  pqr 


whether  the  sign  of  the  difference  between  the  squared  distances  is  +  or  -. 
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STEP  6:  Compute:  npq  -  £^r<£prq  -  Spqr1  ,nd  n‘p,  *  Z^'r^pr,  ‘  L*p,r)* 

where  f'r  =  the  number  of  persona  In  the  r^*1  typo  ,  i.e.,  individuals  having  iden¬ 
tical  profiles  over  the  n  variables,  and  If  =  the  number  of  types  rather  than  per¬ 
sons.  (N.B.  If  each  of  the  points  is  to  be  weighted  by  this  W  element  frequency 
vector,  then  the  initial  configuration  based  on  the  H  types  should  be  adjusted  so 

that  the  weighted  mean  of  X  is  zero  for  each  vector.  Letting  U  represent  the  un¬ 
it 

weighted  normalized  score  vectors,  then:  *■  Upa  -Zjfrur«/"  . »i 

a=l,2, . . . ,m).  X  will  now  be  referred  to  as  the  weighted  normalized  score  matrix.) 

STEP  7:  Calculate  the  N-square  matrix  M  with  typical  off-diagonal  element: 

N 

Vi  =  “("pq  +  nqp^  and  "tyP1081  diagonal  element:  m^  *=  J^m^,  q^* 

The  row  and  column  sums  of  M  *=  C. 

STEP  6:  Similarly  compute  M*  by  substituting  n*pq  for  npq  and  n*qp  for  nqp. 
M*  will  also  he  an  H-square  matrix  whose  rows  and  columns  sum  to  zero. 

STEP  9:  Calculate:  Wpa  =  fpXpa. 


STEP  10:  Determine:  A 


the  coefficient  of  contiguity,  which 


£  w 

varies  between  -1  (representing  perfect  discontiguity)  and  +1  (representing  perfect 
contiguity). 

STEP  11:  If  t  (the  number  of  iterations,  initially  set  «  l)  »  some  preset 
number  or  if  A  =  some  predetermined  cut-off  point,  then  increase  m  to  m+1  and  re¬ 
set  t«l,  provided  that  more  dimensions  are  required  to  get  a  good  fit,  otherwise 
terminate.  When  going  to  a  higher  dimensionality  one  always  startB  with  the  ini¬ 
tial  configuration  in  order  that  the  metric  be  comparable  from  one  set  of  dimen¬ 
sions  to  that  which  is  appended.  If  neither  of  the  first  two  conditions  obtain, 
go  to  the  next  section  for  computing  a  new  trial  set  of  coordinates 
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STEP  12:  Calculate  the  Ifxm  matrix:  Y  =  XbF(M  -  a  basil  for  mod- 

lfying  X. 

■  If 

STEP  13:  Compute:  ft  *=  5  &fp  pa* 


STEP  14:  If  t«l  calculate  the  scalar  c  »  £(1  -  A),  otherwise: 
c  =  ^O/^t-l)  ^  otherwise  set  c  *  1. 

STEP  15:  If  t*l  compute  the  scalar:  <T  «  $(l  -  A),  otherwise: 


<r,  c(t-iyt-i)_ 


STEP  16:  Also  compute  the  scalar:  o<  = 


a=i  p 


f  X2  . 
P  pa 


STEP  17:  Compute  the  multiplicative  scalar  which  keeps  the  results  of  ad¬ 
jacent  iterations  highly  correlated: 

k  =  [(«<c<r)/(^(l-  (c<r)))]^  ,  a  value  which  is  a  monotonic  decreasing  function  of 
t  and  never  reaches  zero  unless  and  until  A  ■  1. 

STEP  18:  Compute  the  new  set  of  coordinates:  Z  =  X  +  kY. 

STEP  19:  Set  new  coordinates  equal  to  the  initial  squared  Euclidean  norm: 

*  ^  )/(^^*pz  3  »  (pk1*2>  •  •  •  ilf»  a=l,2, . . . ,m). 

STEP  20:  Set  t*t+l,  compute  the  matrix  of  squared  Euclidean  distances,  and 
return  to  STEP  1. 

The  above  "average  steepest  ascent"  algorithm  in  general  converges  in  a  few 
iterations,  but  is  a  time-consuming  process  in  that  each  cell  of  M  and  M*  involves 
N(N-l)/2  calculations  and  each  of  these  in  turn  are  based  on  a  large  number  of 
computations  involving  the  n  items  and  the  nc  categories.  MSA-I's  complete  gen¬ 
erality  for  quantitative  and/or  qualitative  data  and  for  linear  and/or  nonlinear 
relationships  makes  it  an  ideal  procedure  for  studying  both  numeric  and  conceptual 
problems  (e.g.,  facet  models:  Guttman,  1959).  Although  the  solutions  resulting 
from  MS A-I  are  embedded  in  an  Euclidean  space,  the  dimensions  of  this  space  are 
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not,  in  general,  meaningful.  One  must  look  at  the  configuration  of  points  In  this 
space  and  study  each  partition  separately  in  terms  of  the  properties  of  regions  in 
order  to  make  full  use  of  this  method.  Since  neither  rectilinear  nor  parallel 
boundaries  are  insisted  upon  in  the  definition  of  contiguity,  one  loses  potential 
information  in  respect  to  order  for  both  items  and  categories.  Indeed,  with  its 
weak  definition  of  contiguity  the  method  01  c,en  results  in  what  might  be  considered 
a  quasi-topological  representation  -  very  revealing  and  certainly  fascinating  from 
a  numoer  of  points  of  view. 

In  summary,  starting  with  the  basic  data  matrix  E  and  defining  a  trial  space 
based  upon  the  normalized  weighted  score  matrix  X,  the  task  set  for  MSA-I  is  that 
of  moving  the  points  around  in  this  space  such  that  a  certain  definition  of  con¬ 
tiguity  is  satisfied  in  a  minimum  number  of  dimensions.  Each  type  is  a  point  in 
Euclidean  space,  each  item  is  a  partitioning  of  this  space,  and  each  region  with¬ 
in  a  partition  represents  a  category.  A  subset  of  the  person  points,  i.e.,  those 
characterized  as  "outer-points",  define  the  contiguous  regions  which  may  assume 
any  form  whatsoever.  The  cutting  points  of  Cuttman’s  earlier  technique  of  scalo- 
gram  analysis  (I9UU )  for  m^l  lie  between  the  outer-points  of  MSA-I.  When  m=2 
there  are  cutting  curves  and  for  m>2  there  are  cutting  surfaces  separating  the 
boundary  definers.  Contrasted  with  the  earlier  method  of  scale  analysis,  not 
only  is  the  number  of  errors  counted  (as  reflected  in  the  coefficient  of  repro¬ 
ducibility),  but  the  size  of  the  errors  is  also  taken  into  consideration  by  the 
coefficient  of  contiguity.  For  example,  take  a  variable  like  religion  whose  three 
categories  were:  a  =  catholic,  b  =  protestant,  and  c  ■  Jewish  and  a  particular 

O 

individual  p  who  fell  in  category  a.  How  if  dc  was  the  smallest  squared  distance 
of  p  from  all  other  points  and  r  happened  to  define  the  category  boundary  of  pro- 
testants  (r  is  an  outer-point  for  category  b),  then  a  decrement  to  the  coefficient  of 
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contiguity  would  be  incurred. 

As  an  illustration  of  MSA-I  the  following  example  of  a  purely  conceptual  an¬ 
alysis  is  given. 

An  MSA-I  of  Social  Structure 

Guttman's  adaptation  (1966  )  of  a  table  appearing  in  Bell  and  Sirjamaki's 
(1961;  p.  325)  sociology  text  provides  a  set  of  five  characterizations  by  which 
groups  of  persons  can  be  differentiated.  The  five  facets  and  their  elements  or 
categories  are  as  follows:  l)  Intensity  of  Interaction  (a  =  slight,  b  *  low, 
c  *  moderate,  and  d  *  high);  2)  Frequency  of  Interaction  (a  =  slight,  b  =  non¬ 
recurring,  c  =  infrequent,  and  d  =  frequent);  3)  Feeling  of  Belonging  (a  =  none, 
b  *  slight,  c  ■  variable,  and  d  =  high);  *♦)  Physical  Proximity  (a  =  distant  and 
b  «  close);  and  5)  Formality  of  Relationship  (a  «  no  relationship,  b  =  formal, 
and  c  =  informal).  The  objects  to  be  classified,  seven  in  number,  e’-e  various 
kinds  of  groups,  i.e.:  1)  Crowd  (aaabb);  2)  Audience  (bbbbb);  3)  Public 

(uabaa);  U)  Mob  (dbdbc);  5)  Primary  Group  (dddbc);  6)  Secondary  Group 
(cccab);  and  7)  Modern  Community  (bccbb).  With  no  a  priori  conceptions  as  to 
order  in  respect  to  types  of  groups,  items,  or  categories  within  an  item  the  follow¬ 
ing  perfect  solution  in  two  dimensions  required  but  one  iteration  (see  Fig.  l). 


Figure  1  about  here 


The  rather  interesting  Y-configuration  that  emerges  places  Primary  Group  and 
Mob  close  together  and  at  the  foot  of  the  Y  and  the  remaining  groups  in  the  order: 
Secondary  Group,  Modern  Community,  Audience,  Crowd,  and  Public  -  forming  the  arc. 
It  will  also  be  noted  that  for  each  of  the  five  characteristics  there  appears  a 
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circular  arrangement  for  the  categories  for  those  types  appearing  on  the  V  part 
of  the  Y,  e.g.,  in  respect  to  Intensity  of  Interaction  the  ordering  goes  from 
moderate  to  low  to  slight  following  the  above  ordering  for  the  five  groups.  Since 
parallel  straight  line  boundaries  can  be  constructed  for  this  configuration,  some 
purchase  on  item  and  category  orders  can  be  obtained.  The  items  and  the  category 
orders  within  items  can  be  arranged  thusly:  Physical  Proximity  (ab),  Feeling  of 
Belonging  (cbad),  Formality  of  Relationship  (cab),  then  either  order  of  the  fol¬ 
lowing  two:  Intensity  of  Interaction  (cbda),  and  Frequency  of  Interaction  (cbda), 
with  the  first  and  last  items  yielding  cu-ting  curves  which  are  orthogonal  to 
each  other  and  the  intermediate  items  having  slopes  for  their  boundaries  that 
are  at  a  slant.  Because  there  are  so  few  points  and  a  relatively  small  number 
of  categories,  alternative  parallel  straight  line  solutions  are  possible.  For 
example,  Guttman's  hand  solution  of  this  problem  yielded  the  following  order  for 
items:  45312,  where  within  each  facet  the  order  of  the  categories  was  maintained 
(1966  ).  Furthermore,  in  his  analysis  he  placed  Primary  Group  as  being  closest 
to  Secondary  Group,  whereas  MSA-I  places  these  two  furthest  apart.  Without  be¬ 
laboring  the  point,  since  this  is  but  an  example,  the  differences  between  the 
hand  solution  and  the  MSA-I  solution  may  have  arisen  from  the  ambiguity  of  the 
categories  within  some  of  the  items,  e.g.,  should  the  order  for  Frequency  of 
Interaction  be:  bacd,  abed,  bead,  or  some  other  order?  We  know  that  the  first 
three  are  opposed  to  the  last,  but  there  might  be  some  question  as  to  the  best 
order  for  the  first  three.  Similarly  for  the  categories  of  the  third  item.  As 
can  be  seen  from  a  discussion  of  these  differences,  MSA-I  may  prove  fruitful  for 
testing  not  only  certain  substantive  issues  but  may  also  be  revealing  in  respect 
to  preconceived  coding  assumptions,  e.g.,  that  the  categories  follow  a  linear 
order. 

We  will  now  pass  on  to  the  third  and  last  technique  of  this  paper,  MSA-II. 
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Multidimensional  Scalogram  Analysis  -  II 

Starting  once  again  with  the  binary  attribute  matrix  E,  let  us  formulate  a 
specification  that  will  reproduce  E  in  the  smallest  possible  space.  One  way  of 
looking  at  the  basic  data  matrix  would  be  that  E  is  an  incomplete  proximity  ma¬ 
trix  having  but  two  coefficients,  i.e.,  1  and  0,  where  the  towb  of  this  matrix 
represent  categories  and  the  columns  persons.  Thus,  whenever  a  1  appears  we 
can  assume  that  some  category  is  in  a  sense  near  aome  person  and  that  the  rela¬ 
tionship  is  symmetric.  In  essence,  the  nexN  rectangular  matrix  E  can  be  thought 
of  as  a  partial  adjacency  matrix  for  a  graph  whose  dimensionality  we  seek.  For¬ 
tunately,  Guttman  (196^  )  has  established  the  necessary  theorems  for  defining  the 
dimensionality  0f  graphs  in  terms  of  smallest  space  theory.  We  can  now  write 
our  specification  as:  given  E,  satisfy  the  inequality  that  whenever  epjc  = 
eqjc  -  1  *=  1  then  dpjc  4  dqjc  for  all  pIP,  ctCj>  •nd  J  c  J  such  that  the  loss 
function,  normalized  phi  (v.i,), is  minimized  for  a  specified  m  dimensions,  where 
d  Is  the  Euclidean  distance.  We  are  defining  binary  relations  in  terms  of  a  dis¬ 
tance  function  such  that  all  points  belonging  to  one  set  (categories)  which  are 
in  relation  to  points  in  the  other  set  (persons)  will  have  smaller  distances  in 
the  Joint  space  of  persons  and  categories  than  all  points  (one  from  each  set)  which 
are  not  in  relation,  i.e.,  epjc  *  0.  Nothing  in  this  statement  is  implied  about 
the  relationships  among  categories,  items,  or  persons  (this  information  does  not 
exist,  although  it  could  be  defined  in  terms  of  the  relationships  between  rows 
and  columns  of  E)  -  all  that  does  exist  in  E  is  the  category -person  relation.  By 
confining  our  attention  to  the  inter-set  relations,  however,  we  should  be  able 
to  infer  something  about  the  intra-set  relationships  that  are  implicit  in  E  from 
the  nature  of  our  solution. 

The  following  is  an  outline  of  the  MSA-II  algorithm  (Lingoes,  1967c). 
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The  MSA -II  Basic  Equations 

STEP  1:  Define  an  nc+N  *  k-square  symmetric  matrix  V  whose  first  nc  row* 
(columns)  represent  the  category  captions  and  whose  r^+1  to  k  rowa  (columns)  rep¬ 
resent  the  person  captions  of  E.  The  basic  data  matrix  appears  as  an  off-diago¬ 
nal  submatrix  of  V  occupying  rows  1  to  nc  and  columns  nc+l  to  k.  For  the  elements 
of  the  submatrix  K,  whenever  e^c  =  1  substitute  1  -  (Nn+l)/(k(k-l) )  and  whenever 
e  .  e  0  substitute  1  -  (N(n+nc)+l)/(k(k-l)).  All  other  off-diagonal  elements  of 

V  are  set  equal  to  ^  -  (n  N+l)/ (k(k-l) ) .  The  k  diagonal  elements  of  V  are  tblcu- 

k 

lated  from  the  following  formula:  v. .  <=  k  -  (1*1,2, ... ,k;  i/,l).  The  row 

1 1  1 J 


(column)  sums  of  V  *  k,  the  order  oi'  the  matrix  and  tr(v)  «=  l+k+k(k-l)/2.  V  is  a 
Gramian  matrix  whose  largest  root  =  k  and  the  elements  of  its  associated  unit 
length  vector  *  (l/k)^,  a  constant. 

STEP  2:  Solve  for:  u(V  -  XI )  *  0,  which  yields  the  Initial  configuration 
(see:  Lingoes,  1967a),  where  I  is  the  order  k  identity  matrix  and  U  and  re¬ 
spectively,  are  the  vectors  of  unit  length  and  the  roots.  Normalize  the  unit 
length  vectors  to  the  size  of  their  associated  roots,  i.e.,  X  ■  UA^.  Ignoring 
the  constant  vector,  order  vectors  by  their  length  from  large  to  small.  The  mean 
of  each  vector  will  be  zero. 

STEP  3:  Calculate  the  ncN  Euclidean  distances  between  every  category  point, 
on  the  one  hand,  and  every  person  point,  on  the  other,  i.e.,  d  (i*l,2,...,nc; 
,j*=nc+l,nc+2, . . .  ,k)  based  on  the  m  dimensions  of  X,  where  m  *  some  predetermined 
number  based  upon  either  a  parameter  or  the  number  of  vectors  whose  roots  are 
k/2  (excepting  the  largest  root). 


STEP  4:  Permute  the  Nn  smallest  distances  so  that  they  occupy  the  same  po¬ 
sitions  that  the  Is  have  in  E  and  permute  the  remaining  N(nc-n)  distances  to  the 
positions  in  E  occupied  by  Os.  ftiese  cell-wise  permuted  distances  are  the  rank 


I 
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lmagea  (Guttman,  1967  ;  Lingoes,  1967a)  of  the  distances,  the  d*'a,  which  within 
tied  blocks  (e.p.,  the  block  of  Is)  have  been  ordered  from  low  to  high. 

STEP  b:  Calculate  the  normalized  phi  coefficient  of  monotonicity: 


«nJ  the  coefficient  of  alienation:  ■  (1  -  (1-0 )2)  ,  which  permits  un  to  guage 

now  good  our  fit  16  in  respect  to  reducing  the  error  of  estimate. 

STEP  6:  Calculate  the  coefficient  of  reproducibility: 

F  =  1  -  nmnbcr  of  errors  where  errors  are  defined  as  the  number  of 

2nN 

distances  which  are  smaller  than  the  largest  distance  for  Is  of  E  whose  positions 
correspond  to  Os  plus  the  number  of  distances  which  are  larger  than  the  smallest 
distance  for  Os  of  E  whose  positions  correspond  to  Is.  This  measure  disregards 
the  magnitudes  of  the  errors  implied  in  the  distances,  being  solely  concerned 
with  the  number  of  such  incorrect  predictions.  When  0  ■  0  it  must  be  true  that 
all  distances  between  categories  and  persons  for  which  epjc  -  1  are  smaller  than 
distances  corresponding  to  cell  entries  of  E  which  are  zero.  We  are  thus  defin¬ 
ing  the  radius  of  a  circle  (more  generally  that  of  a  sphere)  such  that  all  points 
filling  within  that  enclosure  are  in  relation  to  the  point  lying  at  the  center. 
All  points  lying  within  the  sphere  for  which  ep^c  -  0  plus  all  points  lying  out¬ 
side  the  sphere  for  which  epjc  ■  1  are  considered  errors.  R  serves  no  functional 
purpose  in  the  MSA-II  program,  but  is  an  interesting  descriptive  measure  telling 
us  how  well  we  could  reproduce  E. 

STEP  7:  If  we  have  satisfied  a  given  number  of  iterations,  or  if  0  is  suf¬ 
ficiently  small,  or  if  0  has  not  changed  significantly  over  a  number  of  itera- 
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tions,  ve  can  terminate  the  solution  and  then  go  either  up  or  down  In  dimension¬ 
ality  according  to  the  same  options  as  in  S3A-I  (Lingoes,  19&5  ;  1967a).  If, 
however,  none  of  these  conditions  prevail,  proceed  to  the  next  set  of  steps  for 
modifying  the  coordinates  for  another  iteration. 

STEP  8:  For  each  of  the  d*’s  corresponding  to  Is  substitute  a  mean  d*  and 
for  each  of  the  d*'s  corresponding  to  the  Os  of  E  substitute  their  mean  d*.  It 
can  be  seen  that  ve  are  tying  all  distances  that  should  be  tied  such  that  when 
a  solution  has  been  achieved  ties  will  be  broken  in  an  optimal  fashion. 

STEP  9:  Define  a  k- square  symmetric  matrix  C,  the  correction  matrix,  which 
is  coordinate  in  respect  to  the  partitions  of  V  in  STEP  1.  Proceeding  from  top 
to  bottom,  and  within  each,  from  left  to  right,  let  us  number  these  partitions 
thusly:  I,  II,  III,  and  IV.  The  elements  of  these  four  partitions  of  C  are: 


Partition  I  of  order  nc:  ^  .  ■  0  (i^j)  and  cii  “  nc  +  *L-x  ,d*i.l/dij; 


,j=nc+l 


Partition  II  of  order  ncxN:  c^j  =  1  -  d^j/d^; 


Partition  III  of  order  Nxnc:  II'  or  Cj^  of  II;  and, 


nc 


Partition  IV  of  order  N:  c^  =  0  (i/j)  and  c^  *  N  +  £3vd  .11* 

Erch  row  (column)  of  C  sums  to  the  constant  k  and  c^j  =  Cj^.  When  0  =  0,  C  be¬ 
comes  a  scalar  matrix. 

STEP  10:  Compute  a  new  trial  set  of  coordinates  by  the  following  pivotal 
formula: 


xia 


(t+1) 


(t) 


^  x._  c .  .  ,  where  t  *=  iteration  number, 

ic  j— 


STEP  11:  Calculate  the  ncN  distances  based  upon  the  transformed  coordinates 
and  go  back  to  STEP  U  for  another  iteration.  As  an  alternative  to  doing  just  one 
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least  squares  adjustment  for  every  rank  image  permutation,  one  could  uo  ten  least 
squares  corrections  for  every  permutation,  as  is  done  in  most  of  the  smallest 
space  programs,  by  returning  to  STEP  8,  after  which  one  would  return  to  STEP 

In  summery,  based  on  the  binary  attribute  matrix  E  and  using  a  trial  set 
of  coordinates  which  are  a  ftmction  of  the  ranks  of  the  values  in  E,we  specify 
an  orU ogonal  solution  in  a  minimum  number  of  dimensions  ouch  that  for  all  pairs 
of  categories  and  persons  it  will  always  be  true  that  whenever  Ppjc  ~  °q  jc  ”  -*  = 

1  then  dp.c<  dq  . c .  Short  of  perfect  monotonicity,  however,  for  the  given  dimen¬ 
sionality  wc  will  minimize  the  function,  0.  When  the  process  of  least-squarec- 
rank-image-permutations  converges,  we  attain  a  representation  in  a  joint  Euclidean 
space  of  the  two  sets  of  points  (categories  and  persons),  such  that  having,  defined 
the  largest  distance  of  all  points  which  form  a  binary  relation  (the  Is  of  E)  we 
are  able  to  craw  spherical  boundaries,  using  each  point  in  turn  as  a  center.  The 
radius  corresponding  to  the  lergest  distance  will  enclose  all  points  in  relation 
to  the  center  point,  thus  permitting  us  to  reproduce  the  original  response  matrix. 

Rather  than  employing  the  spherical  boundaries  outlined  above,  however,  one 
could  partition  the  joint  space  by  finding  those  hyperplanes  which  bisect  pairs 
of  points  within,  but  not  between,  items.  These  hyperplones  would  then  cut  out 
regions  of  the  space  having  linear  rather  than  circular  boundaries.  From  either 
conception  one  could  determine  to  what  extent  E  is  reproducible.  The  curved 
boundary  formulation,  however,  is  more  easily  implemented  and  produces  a  less 
cluttered  picture. 

The  dimensions  of  the  MSA-II  solution  are  primarily  meaningful  (or  some  ro¬ 
tation  thereof)  in  terms  of  the  configuration  of  person  points,  although  under 
some  circumstances  where  such  a  configuration  allows  parallel  straight  line 
boundaries,  one  may  gain  some  insight  into  item  and  category  structure  (v.i.). 


An  MSA-II  of  Social  Structure 


For  comparison  purposes  we  will  present  an  MSA-II  analysis  ol*  the  same  data 
anniy/.cd  by  MSA- 1  involving  seven  typos  of  social  grou ps  defined  by  five  Kinds  of 
characteristics.  Once  again  a  two  3pace  is  perfectly  adequate  to  portray  oil  the 
interrelationships  of  E  (Figure  2  below). 


Figure  2  about  here 


It  will  be  noted  that  Figure  2  only  contains  the  person  points  since  this 
aspect  interests  us  mostly.  A  comparison  of  the  MSA-I  and  II  configurations  of 
the  seven  groups  reveals  a  remarkable  similarity  between  the  two,  although  the 
rationale  of  these  two  methods  differ  greatly.  A  slightly  tilted  Y  is  apparent, 
Primary  Group  is  closest  to  Mob,  and  a  circular  order  among  the  categories  is 
evident  for  the  five  items  among  the  groups  arrayed  on  the  V  portion  of  the  Y. 
Given  the  profiles  of  these  seven  groups  a  set  of  linear  parallel  boundaries 
can  be  constructed  6uch  that  for  each  item  all  individuals  falling  in  a  particu¬ 
lar  category  will  be  contiguous.  The  partitions  of  this  space  (despite  the  con- 
figural  similarities  to  Figure  l)  are  different  from  the  results  of  MSA-I.  Thus, 
the  item  and  the  within  item  orderings  are:  Riysical  Proximity  (ab),  Formality 
ol  Relationship  (abc),  Frequency  of  Interaction  (cbda),  Intensity  of  Interaction 
(cbad),  and  Feeling  of  Belonging  (cbad).  As  was  mentioned  before,  there  would 
appear  to  be  some  ambiguity  in  respect  to  category  ordering  within  items,  permit- 
t’.ng  alternative  solutions.  Furthermore,  based  upon  the  MAC-II  category  weights 
there  is  a  strong  suspicion  that  curvilinear  relationships  exist  among  these 
variables  giving  rise  to  the  differences  noted  between  Guttman's  hand  solution 
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and  the  MSA-I  and  II  solutions  alike. 

Although  this  one  example  is  insufficient  for  making  any  inferences  about 
what  will  happen  in  general  when  MSA-I  is  compared  with  MSA-II,  certain  obser- 
vationr  having  both  a  practical  and  theoretical  import  are  relevant. 

Some  Comparisons  Between  MSA-I  and  MSA-II 

First,  in  respect  to  the  size  of  a  problem  that  can  be  analyzed  by  these  two 
scalogram  programs,  MSA-I  has  a  greater  capacity  (i.e.,  up  to  >0  variables,  with 
as  many  as  20  categories  for  each,  and  up  to  60  types)  than  MSA-II,  which  is  re¬ 
stricted  to  nc+N^80.  Second,  MSA-II,  being  based  upon  a  much  simpler  algorithm, 
is  considerably  faster  than  MSA-I  for  problem*  of  the  same  magnitude.  Third,  the 
simpler  contours  for  the  boundaries  of  MSA-II  are  more  easily  depicted  and  the  re¬ 
sulting  representation  is  easier  to  grasp  vis-a-vis  the  basic  data  matrix  E.  Fourth, 
MSA-II  would  seem  to  have  more  applications  than  MSA-I,  since  (with  a  minor  modifi¬ 
cation)  the  former  is  not  restricted  to  mutually  exclusive  categories.  Fifth,  in 
respect  to  the  criterion  of  reproducibility,  MSA-I  reserves  a  subset  of  the  person 
points  for  defining  regions  and  these  points  are  not  considered  in  computing  the 
coefficients  of  either  contiguity  or  reproducibility.  In  contrast,  MSA-II  (at  the 
expense  of  including  a  set  of  points  for  categories)  doe6  include  all  person  points 
in  determining  reproducibility  and,  as  such,  is  more  analogous  to  Guttman's  origi¬ 
nal  conception  of  unidimensional  scalogram  analysis  (19M)  and  Lingoes'  generali- 
zation  thereof  for  multiple  unidimensional  scalogram  analysis  (MSA)  for  binary 
data  (i960;  196-ia). 

Both  procedures,  starting  with  the  same  general  data  matrix  E,  are  ideally 
suited  for  the  multidimensional  analysis  of  qualitative  data  and  for  quantitative 
data  where  the  distributional  and  linear  assumptions  of  standard  multivariate 
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tecbnioues  cannot,  he  met  or  are  questionable.  Based  upon  quite  different  ciefini- 
t.ions  and  specifications  (MSA-I  involving  a  definition  of  contiguity  in  terms  of 
out^r-poini  s ,  for  example,  ano  MSA-II  being  based  upon  the  logic  of  smallest 
space  analysis  (see:  Lingoes.  1967b  lor  a  review)  and  a  definition  of  distance 
anti  dimensionality  for  graphs  (r.uttman,  I#1*)),  the  two  procedures  would  appear 
to  yield  essentially  the  same  results.  Further  analyses,  however,  are  necessary 
nefore  reaching  a  final  conclusion  on  this  point.  There  may  well  be  certain  kinds 
of  data  which  are  more  economically  represented  by  one  procedure  than  the  other . 

Summary 

Three  methods  for  analyzing  qualitative  data  were  introduced:  l)  Multivar¬ 
iate  Analysis  of  Contingencies  -  II  (based  on  the  early  work  of  Guttman,  19**l), 

2)  Multidimensional  Scalogram  Analysis  -  I  (involving  a  unique  definition  ol 
contiguity  vl ich  presupposes  a  minimum  of  assumptions),  and  j)  Multidimensional 
Scalogram  Analysis  -  II  (involving  a  graph  theoretic  and  smallest  space  logic). 

An  outline  of  the  basic  equations  and  assumptions  of  each  were  presented.  One 
example  of  a  conceptual  data  matrix  was  analyzed  by  both  MSA-I  and  MSA-11  and 
the  results  were  discussed  vis-a-vis  a  hand  analysis  of  the  same  data  based  upon 
a  linear  ordering  of  the  categories  involved.  FinaXly,  some  comparisons  between 
the  two  scalogram  procedures  were  made. 

In  conclusion,  the  three  methods  discussed  in  this  paper  for  the  multidi¬ 
mensional  analysis  of  both  qualitative  and  quantitative  data  and  of  both  linear 
and  nonlinear  relationships  are  based  upon  a  minimum  number  of  assumptions(more 
c  onsonant  with  our  usual  ignorance  regarding  the  metric  and  distributional  prop¬ 
erties  of  social  science  aat^*  One  would  anticipate,  therefore,  an  increasing 
use  of  these  and  similar  techniques  (e.g.,  the  various  programs  developed  by 
Shepard,  1?62;  Shepard  &  Carroll,  1966;  Shepard  &  Kruskal,  196U;  Kruskal,  1961* 
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and  McGee.  Lr)67 ) .  r.hepord's  1962  breakthrough  paper  provided  much  of  the  impetus 
for  these  current  developments  in  nonrr.etric  methodology  (Lingoes,  1967b). 
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"CIAS8ITICATT0H  80  AS  TO  RELATE  TO  OOTOZDE  VARIABUB8" 


Blverd  W.  Forgy 
UCLA 


Introduction: 

Let  me  first  explain  a  bit  of  history  behind  the  title.  About  a  year  ago 
in  Washington,  D.C.,  there  was  a  conference  on  classification  in  psychiatry. (1) 
One  of  the  desirable  qualities  for  any  classification  system  that  was  unani¬ 
mously  agreed  upon  by  the  conferees  was  that  the  classes  should  be  relevant  to 
a  number  of  other  qualities  about  persons  beyond  the  information  that  went  into 
making  the  classification  or  diagnosis  Itself. 

At  this  same  conference,  a  number  of  empirical  papers  (2,3,U,5,6  )were 
presented  in  which  typologies  were  developed  from  various  kinds  of  data  by 
several  computer  techniques.  In  most  of  these  studies,  after  a  classification 
system  was  developed,  it  was  then  evaluated  to  some  degree  by  relating  it  to 
outside  variables  not  used  in  building  the  system. 

I  couldn’t  help  being  struck  then  by  the  real  absence  of  any  relation  be¬ 
tween  what  classifications  were  hoped  to  accomplish  versus  the  methods  used  to 
develop  them.  In  no  case  did  any  outside  variables  enter  in  any  way  into  the 
computing  that  developed  the  classification  systems.  This  is  analogous  to  a 
situation  in  which  relevunt  linear  functions  of  variables  (rather  than 
classifications)  were  desired,  if  factor  analysis  of  predictor  variables  were 
relied  upon  exclusively  as  the  method  of  obtaining  the  linear  functions.  Given 
a  choice,  most  of  us  wo'iM  naturally  put  available  criterion  variables  into  the 
analysis  as  dependent  variables.  Then,  if  such  relations  are  in  the  data  the 
method  will  find  them,  and  the  linear  functions  that  emerge  will  be  system¬ 
atically  related  to  the  dependent  variables  of  Interest.  Analogously,  I 
couldn't  help  thinking  that  a  systematic  effort  to  find  relevant  classification 
systems  would  probably  be  much  more  successful  than  what  have  been  es  tint  tally 
random  efforts  with  respect  to  relevance. 

There  are,  I  think,  several  kinds  of  reasons  to  account  for  the  absence  of 
clustering  methods  directed  at  relevance  to  outside  measures.  It  appears  that 
many  investigators,  at  least  psychologists  who  use  cluster  analysis,  tend  to 
believe  that  the  computer  has  revealed  some  sort  of  natural,  pre-existing 
typological  structure  in  their  data.  Believing  --or  perhaps  really  only  hoping 
--  this,  they  consider  relevance  to  particular  outside  variables  a  somewhat 
secondary  issue. 

The  comment  by  Dr.  Sokal  on  the  tendency  of  various  clustering  methods  to 
bias  the  results  is  most  pertinent  here.  As  be  pointed  out,  each  method  tends 
to  impose  a  certain  kind  of  structure  upon  the  data,  whether  that  structure  is 
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m~*\ jy  pr— nt  or  not.  I  think  this  applies  with  even  more  force  to  psycho- 
logical  data.  In  biology,  the  question  is  which  arrangement  of  classes  fits 
the  data  best.  In  psychology,  I  think  the  basic,  usually  un- asked  question  is 
whether  any  structure  of  types  or  classes  is  called  for  to  describe  the  data. 

Pd  help  answer  this,  I'd  strongly  recommend  a  simple  experiment  to  anyone  who 
cluster- analyses  his  data  in  order  to  understand  it. 

1.  Generate  some  samples  of  artificial  "cases"  from  a 
uni  modal,  non- cluttered,  non-hasted  population 
for  Instance,  a  .Joint  normal  population  vlth  the 
same  co-variances  as  the  actual  data. 

2.  Put  these  through  the  same  cluster  analysis  process 
as  your  real  dat*. 

The  subtypes  that  may  well  b«  "discovered"  in  a  sample  from  this  classic, 
classless  population  would  provide  a  helpful  baseline  against  which  to  compare 
tha  results  from  the  real  data. 

It  is  also  true  that  relevance  to  outside  variables  is  only  one  of  the 
desirable  qualities  of  a  typology,  and  in  addition  the  most  desired  kind  of 
relevance  la  to  some  brood,  usually  undefined  eet  of  factors,  rather  than  to 
a  particular  outside  variable.  However,  it  remains  that,  while  some  kind  of 
relevance  to  outside  varieties  is  strongly  desired  for  classifications,  the 
main  thing  depended  upon  to  get  it  is  extraordinary  good  luck.  This  being  so, 

I  felt  that  it  might  be  worthwhile  to  attempt  to  develop  something  to  Improve 
the  odds  in  favor  of  an  investigator  who  vants  some  sort  of  relevance. 

The  Concept  of  Maximally  Relevant  Classes: 

Here  another  acknowledgment  is  due,  in  this  case  a  very  delayed  appreciation 
of  another  arson's  idea.  About  five  years  ago  a  UCLA  colleague,  Dr.  James 
McQueen,  (7),  distinguished  several  possible  goals  of  classification.  One  of 
the  goals  spelled  out  was  so  that  class  membership  could  be  used  to  estimate  or 
predict  some  outside  variable  with  minimum  error.  If  the  outside  variable  la 
a  quantitative  one,  on  an  interval  scale,  then  the  familiar  success  criterion 
of  minimising  the  squared  errors  can  be  used.  Then  the  most  relevant 
classification  would  be  that  which  explains  the  largest  amount  of  variance  of 
the  dependent  variable,  Just  as  in  linear  regression. 

This  is  about  the  simplest  kind  of  relevance  possible  --  that  class  member¬ 
ship  Itself  and  alone  will  enable  one  to  make  good  predictions  about  one  other 
characteristic.  Because  it  le  so  simple,  it  will  be  the  problem  dealt  with  In 
this  paper. 

This  is  not  to  deny  that  broader  or  more  subtle  kinds  of  relevance  exist, 
fro  kinds  of  extensions  are  obvious: 

l)  Asking  for  the  same  kind  of  relevance  but  to  larger 
set  of  variables. 
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2)  Asking  for  a  different  kind  of  relevance,  i.e.  that  class 
membership  will  sharpen  the  predictive  value  of  other 
variables,  even  though  class  may  predict  poorly  itself. 

Discussion  of  these  will  be  postponed  beyond  this  paper. 

Pie  Mature  of  tne  Classification  Rule : 

By  this  is  meant  a  rule  which  will  assign  any  individual  with  information 
on  a  set  of  variables  (the  X's)  to  one  of  several  mutually  exclusive  classes. 
Pie  forms  such  classification  rules  might  take  are  quite  varied,  and  some 
choice  must  be  made  before  any  method  to  maximise  relevance  can  "get  off  the 
ground."  One  particular  form  of  rule  will  be  outlined  here,  and  it  will  be 
used  as  the  basis  of  a  procedure  to  seek  maximum  relevance  for  its  classes. 

Let  ua  specify  one  location  in  X- space  for  each  group.  Each  location 
would  of  course  be  defined  by  as  many  numbers  (co-ordinates)  as  there  are  X- 
var tables.  Pie  class  membership  of  any  case  can  be  determined  simply  by  seeing 
which  location  is  nearest  to  that  case.  We  will  use  Euclidean  distance  as  a 
measure.  Parenthetically,  by  various  transformations  of  the  X's  beforehand. 
Euclidean  distance  can  be  made  to  reflect  almost  any  desired  kind  of  similarity. 

Each  location  need  not  be  close  in  an  absolute  sense  to  the  members  of  its 
group,  but  it  would  of  course  always  lie  in  the  same  region.  Pie  boundaries 
in  X-space  defined  by  this  classification  rule  would  be  straight  lines,  planes, 
or  hyperplanes;  they  are  segments  of  the  perpendicular  bisectors  between  the 
various  locations.  Pie  regions  of  X-space  produced  would  sometimes  be  bounded 
on  all  sides,  sometimes  not.  Regions  would  not  overlap,  and  each  region  would 
always  be  convex.  Such  a  system,  while  logically  simple,  still  permits  a  fair 
degree  of  flexibility  in  the  size  and  shape  of  possible  regions  which  define 
classes.  A  set  of  locations,  once  defined,  can  then  be  used  later  to  classify 
other  cases  with  the  same  X-  measurements  --  i.e.  the  system  can  be  "cross- 
validated "  --  and  its  relevance  to  various  other  outside  variables  may  be  seen. 
Pie  mean  Y  value  of  each  class  can  be  used  to  make  absolute  predictions  of  Y 
for  new  cases. 

Such  a  system  is  really  a  species  of  nonlinear  prediction  function,  and  it 
may  be  evaluated  in  comparison  with  other  sorts  of  predictions;  for  instance 
linear  regression,  quadratic  regression,  etc.  Pius  not  only  may  the  relevance 
of  various  systems  of  this  sort  be  compared,  but  the  efficiency  of  the  whole 
classification  approach  may  be  compared  to  the  efficiency  of  other  ways  of 
making  predictions. 

Such  comparisons  would  probably  be  a  very  healthy  thing  for  the  field  of 
cluster  analysis.  They  would  give  us  a  more  realistic  idea  of  what  we  are  and 
are  not  accomplishing.  Predicting  via  discrete  classes  has  both  potential 
advantages  and  disadvantages  when  compared  to  other  ways  of  using  information 
to  make  specific  forecasts. 

Pie  nature  of  the  relation  of  Y  to  the  X's  obviously  affects  the  relative 
success  of  using  classes  vs. the  more  usual  kinds  of  algebraic  functions  to 
predict  Y.  Piis  is  seen  most  clearly  when  there  is  only  one  X- variable  so  that 
relations  may  be  drawn  on  an  ordinary  graph.  If  the  relation  consisted  of 
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several  horizontal  line-segments  vlth  sharp  discontinuities  in- between,  then  Y 
could  be  predicted  perfectly  via  discrete  classes.  On  the  ether  hand,  if  it 
were  a  single  slanted  straight  line,  then  predicting  Y  by  a  linear  function  of 
X  would  be  better  than  using  any  finite  nuiber  of  discrete  classes.  Smooth  but 
curvilinear  relations  could  give  various  results,  depending  on  the  degree  of 
non-linearity,  complexity  of  the  curve,  number  of  classes  used,  etc. 

Ron -linear  algebraic  functions  of  the  X's  --  exponential  terms,  etc.  -- 
would  sometimes  do  better  than  either  classes  or  linear  terms.  However,  there 
is  no  real  limit  to  the  possible  variety  and  complexity  of  non-linear  functions 
but  there  usually  is  a  limit  to  the  size  of  the  data  sample. 

We  shall  now  return  to  the  problem  of  how  best  predict  Y  via  classes  defined 
on  the  X's. 

The  Approach: 

To  make  the  problem  of  maximizing  the  relevance  of  such  a  classification 
rule  a  tractable  one,  we  need  to  limit  the  number  of  classes.  If  there  are  as 
many  classes  as  cases  in  the  data  sample,  then  we  have  the  Nearest- Neighbor 
prediction  rule  of  Fix  and  Hodges  (8),  which  again  indicates  the  overlap  be¬ 
tween  this  kind  of  cluster  analysis  and  prediction  theory.  But  when  a  data 
sample  contains  hundreds  or  thousands  of  cases,  that  procedure  becomes  an 
Increasingly  unvleldly  classification  or  prediction  system;  all  the  work  is 
in  using  the  system,  rather  than  in  developing  it.  One  would  expect  cross- 
validation  performance  to  be  poor,  and  such  a  system  itself  is  not  at  all 
Interpretable  or  easy  to  describe  and  communicate.  Hie  optimum  number  of 
classes  would  depend  upon  the  relative  gain  in  explanation  or  prediction  from 
additional  classes  versus  the  loss  in  the  form  of  extra  mechanical  or  conceptual 
effort  required  to  use  it. 

For  the  present  we  shall  dodge  this  problem  by  holding  the  number  of  classes 
constant,  and  the  solving  for  maximum  relevance.  Of  course,  a  variety  of 
solutions  with  different  numbers  of  classes  could  give  an  indication  of  the 
point  of  diminishing  returns  in  any  particular  applied  problem. 

When  N  is  over  30  or  so,  I  am  sure  that  finding  the  optimal  classification 
into  some  given  number  of  groups  is  beyond  us  for  some  time,  just  as  is  the 
simpler  problem  of  finding  the  minimum- variance  partition  among  the  X's  alone. 
However,  as  in  the  latter  case,  it  may  be  possible  to  find  some  very  good 
solutions  --  with  a  fair  degree  of  certainty  --  even  if  not  necessarily  the 
best  solution  that  we  are  accustomed  to  getting  from  least-squares  methods. 

On  the  minimum- variance  problem,  I  found  (9,10)  that  one  way  of  getting  a 
number  of  good  solutions  is  to  start  with  a  number  of  poor  solutions.  For 
instance,  some  arbitrary  or  random  partitions,  or  with  results  from  another 
method  susceptible  to  improvement.  Then  I  applied  various  improvement 
algorithms.,  making  changes  only  when  there  was  a  demonstrable  gain  from  doing 
so.  One  of  the  surprising  (to  me)  results  of  such  a  procedure  was  that  the 
goodness  of  the  final  classification  depended  very  little,  and  not  at  all 
systematically,  upon  the  starting-point.  The  computer  program  very  rarely  got 
"hung  up"  upon  a  very  poor  solution,  regardless  of  what  it  started  from. 
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Hopefully  this  may  eventually  he  true  of  the  same  approach  applied  on  the 
present  problem,  even  though  the  improvement  algorithms  will  differ. 

Starting  olasses  as  well  as  final  solution  classes  will  be  defined  in  tents 
of  point  locations  (which  we  shall  call  P's),  one  for  each  group.  A  convenient 
source  of  "reasonable"  P's  --  that  is,  ones  which  produce  groups  all  with  at 
least  one  member  —  is  the  data  sample  itself.  If  the  P's  are  simply  set  equal 
to  the  X-coordinates  of  certain  cases  (perhaps  chosen  at  random),  then  we  have 
a  starting  classification  with  the  desired  geometric  properties  and  necessarily 
with  at  least  one  member  in  each  class.  Alternatively,  prior  information  might 
be  used  to  define  the  starting  P's  directly.  Another  starting  point  would  be 
to  perform  a  minimum-variance  type  of  clustering  on  the  X- variables  alone.  If 
a  single  starting  point  is  desired  this  might  well  be  a  very  sensible  one. 

However,  if  a  number  of  widely  different  starting  points  are  desired,  they  could 
be  achieved  most  easily  by  a  procoss  of  generating  random  selections  of  case 
numbers  and  setting  the  defining  locations  equal  to  the  X's  values  of  these  oases* 

The  Improvement  Algorithm t 

Central  to  the  whole  process  is  a  fairly  economical  way  of  exploring  a 
limited  set  of  possible  changes  in  the  definition  of  classes,  evaluating  such 
changes,  and  making  them  rhenever  the  between-class  variance  of  Y  is  increased. 

-  Given  the  chosen  way  of  defining  classes,  an  obvious  way  of  changing  the 
classification  system  is  to  move  a  one  or  more  of  the  locations.  But  in  what 
direction,  and  how  far??  In  multivariate  data  there  are  an  infinity  of 
different  directions  that  might  be  tried,  and  in  each  of  these  there's  no 
guarantee  that,  as  a  location  is  moved,  the  variance  of  Y  accounted  for  will 
change  in  a  simple  way.  On  the  contrary,  given  the  discrete  nature  of  data 
points,  the  occurrence  of  multiple  minimia  is  to  be  expected. 

On  the  question  of  how  far  to  move,  fortunately  there  are  some  natural 
limits.  Moving  in  any  direction,  there  is  a  limit  beyond  which  a  defining  point 
will  cause  some  group  (usually  its  own)  to  have  no  members  at  all.  Moving  past 
this  limit  would  be  pointless,  since  it  changes  the  nature  of  the  problem  by 
reducing  the  number  of  classes.  In  some  directions  there  will  be  a  closer 
"natural"  limit,  because  continued  movement  would  lead  to  crossing  the  old 
boundary  of  the  region  defined  by  its  original  position.  Making  this  boundary 
an  additional  limit  for  a  move  would  have  the  effect  of  allowing  only  moderate 
changes  in  the  classification  system  during  any  one  cycle  of  the  iteration. 

Now  back  to  the  question  of  direction  for  the  move.  Imagine  a  region  with 
large  Y  variance,  and  in  which  Y  is  somewhat  systematically  related  to  the  X's. 
Such  a  region  is  a  good  candidate  for  being  "purified"  —  made  more  homogeneous 
on  Y  —  which  would  improve  the  whole  system.  If  the  relation  of  Y  to  the  X's 
is  not  too  complex  within  each  region,  then  fitting  a  regression  plane  to  it 
would  be  a  fairly  accurate  summary  of  this  relation,  (in  practice,  the  slope 
coefficients  for  the  regression  planes  should  be  computed  in  a  "stepwise" 
fashion,  so  that  there  will  be  a  solution  even  when  a  group  has  fewer  points 
than  there  are  X-variables. )  This  regression  plane  might  provide  a  promising 
direction  for  a  future  move  of  the  location.  If  it  is  moved  "up"  the  plane 
(i.e.  in  the  direction  where  high-Y  values  are  found),  then  it  will  often  change 
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the  boundaries  In  such  a  way  as  to  leave  out  some  of  the  lov-Y  cases  behind. 

If  the  location  is  moved  "down"  the  plane  (toward  low-Y  cases)  then  it  will 
often  act  so  as  to  leave  out  the  high-Y  cases  behind  it.  In  some  cases,  moving 
the  location  either  direction  may  improve  the  classification  system.  Hie  only 
feasible  way  to  survey  the  effects  of  moving  locations  seems  to  be  to  evaluate 
the  vhole  classification  system  at  a  number  of  specific  moves  -  perhaps  at 
equal  intervals  between  the  permitted  limits  --  and  choose  the  best  location 
found. 

The  simplest  way  to  move  would  be  to  move  one  group  at  a  time,  and  proceed 
through  the  groups  sequentially.  If  any  group  is  improved  by  a  move,  then  it 
may  be  worthwhile  to  cycle  back  and  go  through  the  groups  again,  because  the 
move  of  a  single  group  may  change  the  boundaries  and  thus  the  membership  of  all 
the  other  groups.  The  process  terminates  when  an  entire  cycle  of  groups  has 
been  gone  through  with  no  further  improvement. 

The  "Y-GROUPS"  Computer  Program: 

Hie  two  flow  charts  show  the  structure  of  the  program  and  of  the  central 
"MOVE"  Process.  The  user's  description  will  give  an  idea  of  the  various  options 
and  limitations  of  the  present  program.  It  was  written  in  Fortran  4,  and  has 
performed  on  the  IK-1  360/ 7 5  at  the  Health  Sciences  Computing  Facility  at  UCLA. 

It  is  still  under  development  but  copies  of  the  program  may  be  obtained  from 
the  writer. 

Performance  of  "Y-GROUPS"  on  Test  Problems: 


Artificial  data  in  known  configurations  involving  1,  2,  and  4  X-  variables 
were  generated  and  used  to  test  the  algorithm,  which  proceeded  from  five  random 
starting  partitions  on  each  problem.  The  results  were  excellent  when  no  "noise" 
X-variables  were  present,  but  somewhat  uneven  over  the  several  (random)  starts 
when  half  or  more  of  the  X-variables  were  unrelated  to  Y  in  ?ny  fashion.  All 
problems  involved  nonlinear  relations  of  the  X’s  to  Y,  and  in  all  even  the 
worst  "Y-GROUPS"  solutions  were  superior  to  linear  regression  in  terms  of  the 
amount  of  Y  variance  accounted  for. 

Discussion: 


While  the  present  "Y-GFOUPS"  Program  handled  the  test  problems  fairly  well, 
It  is  only  a  first  step  toward  a  dependable  method  to  establish  maximally 
relevant  classes.  Some  detailed  analysis  of  its  occasional  "mistakes"  on  the 
several  test  problems  should  lead  to  improvements  in  the  algorithm  or  perhaps 
even  to  a  better  general  approach. 

While  it  is,  to  my  knowledge,  the  only  cluster  analysis  method  directed  at 
relevance  to  outside  measures,  a  number  of  developments  in  other  fields  have 
goals  that  are  quite  similar.  For  example,  the  "learning  machines"  of  Hunt  (12) 
and  others  establish  a  very  different  kind  of  classification  rule  in  X-  space  -- 
one  composed  of  logical  "and's"  &  "or's",  and  make  it  successively  more 
complicated  until  it  achieves  perfect  relevance  within  the  data  sample.  Hie 
many  configural  scoring  and  pattern  prediction  methods  of  psychologists  are 
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also  pertinent,  though  ve  do  not  think  of  then  as  defining  classification 
systems.  However,  again  they  would  form  a  very  complex  and  logically  "messy" 
typologies,  and  their  cross-validation  performance  has  usually  been  quite 
disappointing.  Yet  another  heading  under  which  related  work  is  being  done  is 
that  of  "pattern  recognition"  ( 13 )  *  And  the  problem  of  finding  the  optimum  way 
to  stratify  a  sample  (lU)  is  essentially  similar,  although  I  believe  it  has  only 
been  handled  neatly  only  in  situations  with  one  variable. 

Thus,  while  there  is  not  yet  much  being  directed  at  making  simple  clusters, 
classes,  or  types  relevant  to  outside  measures,  there  has  been  a  great  deal  of 
effort  and  accomplishment  at  making  other  functions  of  multivariate  data  as 
relevant  as  possible  to  a  wide  variety  of  outside  variables.  There  is  almost 
certainly  much  to  be  learned  from  this  work. 

And  even  with  an  Imperfect  method  in  hand,  I  think  there  is  also  much  that 
could  be  learned  by  applying  it  to  empirical  data  which  have  been  analyzed  in 
enough  other  ways  so  nr  -not  to  be  a  completely  unknown  quantity.  I  hope  to 
do  some  of  this  myself  in  the  near  future,  with  some  MMPI  data  in  which  relations 
are  alleged  to  be  distinctly  nonlinear.  Looking  at  the  same  body  of  data  from 
several  different  perspectives  will  certainly  help  us  understand  clustering 
methods  better  than  ve  do,  and  this  should  eventually  help  the  empirical 
investigator  come  to  more  reasonable  conclusions  about  what  his  data  mean. 
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Two  distinct  problems  In  the  methodology  of  cluster  analysis  have 
become  apparent  In  this  conference.  The  first  problem  concerns  what  Is 
clustered  and  the  second  concerns  now.  In  general,  rost  profile  cluster¬ 
ing  technlciues  Involve  first  the  computing  of  a  matrix  of  similarity  In¬ 
dices  among  all  possible  pairs  of  profiles  and  secondly  the  analysis  of 
this  matrix  to  Identify  subsets  or  clusters  characterized  by  relative 
homogenlety  within  cluster  and  relative  Independence  netween  clusters. 

Most  discussion  has  been  directed  at  the  problem  of  how  to  Identify  homo¬ 
geneous  clusters,  with  the  tacit  recognition  trat  most  methods  can  be 
applied  to  a  variety  of  different  kinds  of  oroflle  similarity  measures; 
however,  Important  questions  exist  concerning  the  meaningfulness  and 
psychometric  properties  of  the  profile  similarity  Indices  th3t  provide 
a  basis  for  clustering.  If  clustering  Is  tc  be  meaningful  and  valid, 
reliability  must  be  an  Important  consideration  In  choice  of  the  profile 
similarity  Index  to  be  used. 

The  distance  between  two  multivariate  profiles  can  be  considered 
to  be  a  measurement  statistic.  This  paper  Is  concerned  with  an  empirical 
Investigation  of  distance  function  reliabilities ,  or  mere  specifically 
with  the  consistencies  between  interprofile  distances  derived  from  rat¬ 
ing  profiles  provided  by  two  Independent  observers  for  tne  same  samples 
of  subjects.  Considering  interproflle  distance  to  ne  a  measurement, 
the  Investigation  is  concerned  with  slmole  lnterrater  reliabilities  cf 
distances  computed  In  different  ways. 

Several  different  methods  for  computing  Interproflle  similarities 
which  can  be  conceived  as  representing  Pythagorean  distances  In  Eucli¬ 
dean  geometric  space  have  been  proposed  In  the  literature.  They  dif¬ 
fer  in  manner  of  defining  coordinate  axes  and  In  the  extent  to  which 
properties  of  the  geometric  model  are  Identified  with  properties  of  the 
measurements.  In  order  for  a  geometric  system  to  serve  as  a  model,  cer¬ 
tain  points  of  coincidence  need  oe  established  between  the  abstract  geo¬ 
metric  model  and  the  measurement  domain  it  Is  supposed  to  represent. 

With  regard  to  univariate  measurement  scaling,  Stevens  (1952)  z 
discussed  the  proolem  of  establishing  points  of  coincidence.  Isomorphism, 
between  real  world  and  model.  Differert  levels  of  measurement  -  nominal, 
ordinal,  interval  and  ratio-scales  -  represent  differert  degrees  of  cor¬ 
respondence  between  abstract  number  system  r.r.u  the  re^l  world  It  13 
supposed  to  represent.  Ir.  the  measurement  of  distances  between  multivar¬ 
iate  profiles,  a  similar  problem  exists.  Or*  the  one  land  is  tr.e  abstract 
Pytnagorean  model  with  mutually  orthogonal  reference  axes  of  unit  scale 
Interval;  on  the  other  hand  is  a  multivariate  ae os  ireirent  domain.  If 
the  geometric  system  Is  to  serve  ns  a  model*  certain  points  cf  coincidence 
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msf  «  »stabllsied.  It  is  clearly  nonsense  to  say  that  "we  may  legit¬ 
imately  emolo.’  t  ie  Pythagorean  model  to  calculate  interpoint  distances 
without  concern  for  the  degree  of  correlation  among  profile  elements  ^ 
provided  It  Is  assumed  that  our  coordinate  axes  are  mutually  grt^gonal 
THeermann,  1965)!  The  coordinate  axes  In  tv-e  model  are  orthogonal,  rnd 
no  assumption  Is  Involved  there.  To  what  properties  or  cnaracterlstlcs 
of  nature  does  the  "assumption*  appl\? 

The  lowest  level  of  correspondence ,  perhaps  oest  conceived  as 
analogous  to  nominal  scaling,  Involves  associating  with  each  ortnogonal 
axis  of  the  model  a  single  measurement  variable,  without  regard  for  cor¬ 
relations  among  measurements  or  comparability  of  scale  units.  At  this 
lowest  level  of  looraorpnlsm,  the  angles  between  the  reference  axes  and  the 
units  of  distance  along  the  axes  have  no  meaning  with  regard  to  statis¬ 
tical  properties  of  the  measurements.  While  It  Is  true  tha*  we  can  use 
the  simple  Py<- hagorean  formula  to  calculate  lnterproflle  distances  with¬ 
out  necessity  for  associating  geometric  angles  and  axis  lengths  with 
.-my  pro-  ert  les  of  the  data,  the  meaningfulness  of  such  calculations 
appear.-  ~uestl  enable  (Overall,  196**)  • 

The  degree  of  correspondence  oetween  abstract  geometric  model  and 
measurement  domain  can  be  Increased  by  enuating  statistical  properties 
of  the  data  with  orthogonality  of  reference  axes  and  with  units  of  length 
of  the  geometric  axes.  Por  example,  reference  axes  can  be  associated  wl-h 
statistically  Independent,  equal-variance  transformations  of  the  original 
data.  Tne  orthogonal  transformations  can  be  obtained  In  a  variety  of 
ways,  and  different  numbers  of  transformed  orthogonal  variates  can  be 
emnloved  in  computing  lnterproflle  distances.  The  variations  in  nature 
1  r. umber  of  transformed  variates  provide  different  distance  measures 
which  have  been  the  suoject  of  this  Investigation.  Is  orthogonal  trans¬ 
formation  useful?  If  so,  what  kind  and  how  man”  transformed  variates 
should  one  use?  reliability  Is  one  criterion  to  consider  In  evaluating 
answers  to  these  questions. 

Tre  first  distance  Index  of  Interest  Is  the  simple  d?  statistic 
(Cronbach  and  Sleser,  1953). 

(!)  d*  *  d,*  ♦  d,’  ♦  ...  ♦  dp*  =  A'  A 

rieccgnlzlng  that  profile  elements  may  be  correlated  to  differing 
and  unknown  extent  and  that  units  of  measurement  may  lack  comparabl  1 1  ty , 
an  orthogonal  transformation  of  the  original  p  correlated  measurements 
may  oe  sought  to  yield  a  new  set  of  p  uncorrelated,  equal-variance  trans¬ 
formations  of  the  p  correlated  profile  variates.  Such  transformation 
can  Oe  obtained  using  the  Inverse  covariance  matrix.  The  distance  function 
computed  from  the  transformed  variates  will  oe  called  a  Mahalanobls  - 
type  D*.  (Since  the  Mahalanobls  distance  between  groups  is  something 
quite  different  from  this  simple  lnterproflle  distance,  we  are  probably 
doing  tv,e  late  Professor  Mahalanobls  a  disservice  in  using  this  termin¬ 
ology  ) . 

(2) 


D9  =  C-i  i 


14.03 


Investigation  of  Reliability  of  Different  Profile  Similarity  Indices 
John  E.  Overall 


Finally,  another  orthogonal  transformation  of  the  original  profile 
elements  based  on  factor  analysis  of  the  correlation  (covariance)  matrix 
has  been  suggested  oy  the  present  Investigator  as  having  certain  appeal. 

If  the  total  variance  Is  employed  In  the  principal  diagonal  of  the  matrix 
of  lntercorrelatlons  among  profile  elements,  factor  variates  which  have 
statistical  properties  of  orthogonality  and  equal  variance  can  be  obtained 
(Overall,  1962).  Distances  between  profiles  can  be  computed  using  the 
Pythagorean  model  such  that  angles  between  reference  axes  and  unit  axis 
lengths  have  meaning  In  terms  of  the  statistical  properties  of  the  trans¬ 
formed  variates.  In  addition,  the  factor  variates  may  have  meaningful 
psychological  Interpretation,  increased  measurement  reliability  and  other 
desirable  properties, 

Dp?  =  dp  8  +  dF  1  +  ...  +  dp  8  =  W  W'  i 

where  W  =  C”1F  (for  factors  extracted  from  covariance  matrix), 
or  where  W  =C“'  V  F  in  which  V  is  a  diagonal  matrix  containing 
test  standard  deviations  (for  factors  extracted  from  a  cor¬ 
relation  matrix). 

Relationships  between  the  Three  Indices  sL  P.r.qf-Ug.  slPlUarUy,« 

The  simple  d8  statistic  (l)  Is  a  special  case  of  the  Mahalanobls- 
type  D8  statistic  (2).  If  It  can  be  assumed  that  profile  elements  are 
uncorrelated  ani  have  equal  variances,  the  inverse  covariance  matrix  C_1 
In  equation  2  will  be  a  diagonal  matrix  proportional  to  an  ldenty  matrix 
by  a  scalar  constant. 

D8  =  £'  C"i  £  =  I  i  =  i '  4  =  d? 

If,  on  the  other  hand,  profile  elements  are  not  uncorrelated  and  variances 
are  not  equal,  then  the  simple  d8  statistic  may  be  quite  different  from 
the  transformed  D8  statistic.  As  Cronbach  and  Gleser  (1953)  have  pointed 
out,  the  failure  to  take  Into  account  profile-element  correlations  re¬ 
sults  In  statistically  orthogonal  factors  being  weighted  according  to 
the  extent  of  representation  In  the  profile,  while  In  the  transformed  D8 
each  orthogonal  dimension  Is  weighted  equally. 

The  Mahalanob^ s-type  Da  (equation  2)  is  a  special  case  of  the  general 
factor  space  Dp8  (equation  3).  If  factoring  is  continued  until  p  ortho¬ 
gonal  factors  nave  been  extracted  from  the  p-order  covariance  (correlation) 
matrix,  the  Dp8  computed  from  the  p  transformed  orthogonal  factor  variates 
will  be  precisely  the  Mahalanobis-type  D8  for  the  same  prcfllas.  (This 
equivalence  will  be  illustrated  only  for  the  factoring  of  covariance 
matrix;  however,  it  should  be  obvious  that  the  same  relationship  holds 
for  factors  extracted  from  a  correlation  matrix  when  loadings  are  re¬ 
scaled  through  multiplying  by  test  standard  deviations.)  When  the  co- 
variance  matrix  is  factored  completely,  It  can  be  reproduced  perfectly 
from  the  factor  loadings. 


C  =  F  F  ' 


The  factor  score  transformation  matrix  is  obtained 
W  =  C"1  F. 

The  factor  space  distance  function  Dp?  is  computed  by  equation  3* 

V  =  W  W  4  =  C-1  F  F'  C“>  d  =  C“1  4  =  D* 

Thus,  the  Mahalanobis-type  D5  Is  precisely  equivalent  to  the  factor  space 
Dps  in  the  special  case  where  factoring  has  proceeded  to  extraction  of  all 
p  factors.  Where  factoring  Is  terminated  after  r<p  factors  have  been  ex¬ 
tracted,  the  Mahalanobis-type  Da  may  be  substantially  different  from  the 
factor  space  Dp?. 

If  we  conceive  that  a  matrix  may  contain  only  r  <  p  reliable  factors 
and  that  additional  factors  may  represent  only  error  variance,  we  have  a 
basis  for  understanding  the  very  considerable  differences  In  reliability 
of  results  which  will  be  reported  to  exist  between  the  alternative  ap¬ 
proaches.  If  a  matrix  is  factored  completely  and  orthogonal  factor  var¬ 
iates  are  all  scaled  to  equal  variance,  the  effect  will  be  to  Increase 
greatly  the  error  involved  in  accessing  profile  similarities  when,  In 
fact,  there  are  only  a  few  true  common  factors  and  many  small  error  factors 
(now  stretched  to  unit  length  Just  like  the  true  factors). 

These  results  appear  to  mediate  against  my  previous  recommendation 
of  the  complete  Mahalanobls-type  Da ,  not  because  It  Is  unimportant  to 
establish  coincidence  with  geometric  properties  of  orthogonality  and 
equal  unit  coordinate  axes,  but  because  the  coordinate  axes  need  to  be 
defined  in  terms  of  true,  non-error  factors.  Since  the  Mahalanobis-type 
orthogonal  transformation  Is  equivalent  to  complete  orthogonal  factoring, 
an  interesting  question  arises  concerning  how  many  transformed  orthogonal 
variates  should  be  used  In  computing  Interprof lie  distances. 


Study  of 


Dlstanc 


In  psychiatric  symptom  ratings,  the  degree  of  agreement  between  two 
Independent  observers  represents  an  Important  kind  of  reliability.  Unless 
two  observers  can  agree  concerning  the  level  of  symptomatology  present  in 
each  patient,  there  is  little  basis  for  confidence  that  the  ratings  re¬ 
present  true  status  of  the  patients.  Where  psychiatric  rating  profiles 
are  used  as  a  basis  for  clustering  of  patients  with  the  hope  of  Identify¬ 
ing  naturally  occurring  homogeneous  modal  types,  it  is  Important  to  know 
the  extent  to  which  the  same  cluster  results  can  be  expected  to  result 
from  ratings  made  by  different  observers.  If  the  relative  distances  be¬ 
tween  patients  differ  widely  from  one  Independent  observer  to  the  next, 
one  can  have  little  confidence  that  the  cluster  results  really  represent 
fundamental  types  of  patients. 


The  present  investigation  was  undertaken  on  the  assumption  that 
some  types  ol  profile  similarity  indices  m ry  be  more  invariant  (reliable) 
across  different  observers  than  others.  The  investigation  involved 


i 
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comparing  interproflle  distances  derived  from  ratings  by  one  observer  with 
Interprofile  distances  derived  from  ratings  by  another  observer.  This 
comparison  was  made  for  seven  different  measures  of  profile  slm'larlty, 
representing  variations  of  the  three  basic  models  described  above  (d®, 

D®  and  Dp® ) .  The  analyses  were  replicated  across  seven  independent  random 
samples  of  20  patients  -  each  sample  yielding  n(n-l)/2  =  190  Interproflle 
distances  for  ratings  by  each  observer.  The  seven  different  distance 
indices  for  which  interrater  reliabilities  were  evaluated  are  shown  In 
Taole  l. 

The  first  three  series  of  analyses  involved  Interproflle  distances 
computed  using  only  the  Information  present  in  each  sample  of  20  profiles 
being  analyzed.  In  the  case  of  the  simple  d®  index,  this  is  ^il  Infor¬ 
mation  that  can  be  used  since  no  transformation  of  the  original  variates 
is  imposed.  With  the  Mahalanobis  D®  ,  the  original  variates  are  trans¬ 
formed  to  a  set  of  mathematical  variates  which  are  statistically  ortho¬ 
gonal  In  some  population  or  in  some  sample.  When  Inter-profile  distances 
are  computed  using  only  the  information  present  In  the  sample,  the  trans¬ 
formed  variates  are  statistically  orthogonal  within  that  sample.  Such 
an  orthogonal  basis  contains  the  sampling  error  present  In  the  specific 
small-sample  covariance  matrix;  hence,  it  may  be  different  from  one  rater 
to  the  next.  Variations  In  the  covariance  matrix,  thus,  contribute  to 
variability  of  D®  results  fror.  one  rater  to  the  next,  even  within  the 
same  sample  of  patients.  Using  the  factor  space  D„®  model,  the  original 
variates  are  transformed  to  a  set  of  r  <  p  mathematical  variates  which 
are  statistic  lly  orthogonal  In  the  sample  or  population  represented  In 
the  correlation  (covariance)  matrix  which  Is  factored.  When  the  ortho¬ 
gonal  factor  variates  are  derived  from  analysis  of  the  small-sample 
correlation  (covariance)  matrix  Involving  only  the  cases  for  which  Inter¬ 
proflle  distances  are  being  computed,  the  factor  variates  will  be  in¬ 
fluenced  by  sampling  variability  in  the  covariances.  Since  the  covar¬ 
iance  matrix  will  differ  from  one  rater  to  the  next,  some  variability  in 
distance  function  results  may  be  introduced.  On  the  other  hand,  common 
factors  tend  to  be  more  stable  than  individual  variables  so  that  relia¬ 
bility  may  be  Increased. 

The  three  types  of  distance  Indices  were  computed  for  all  possible 
pairs  of  patients  in  the  seven  samples,  first  using  ratings  by  one  rater 
and  then  using  ratings  for  the  same  patients  made  by  another  rater.  The 
n(n-l)/2  -  190  resulting  paired  distance  indices  in  each  sample  were 
intercorrelated  for  the  two  raters.  In  this  instance,  the  rank  correlation 
coefficient  was  employed  as  a  simple  descriptive  index  of  the  relative 
similarities  of  distance  indices  computed  from  ratings  by  the  two  inde¬ 
pendent  observers.  (No  assumptions  concerning  distributions  of  these 
coefficients  were  made.)  The  results  of  correlations  between  paired  dis¬ 
tance  measures  for  the  three  types  of  Indices  are  plotted  in  Figure  1 
for  the  seven  independent  samples  of  patients. 

The  results  indicate  that  the  orderings  of  simple  d®  indices  were 
consistently  most  similar  for  the  two  independent  raters.  The  factor 
space  Dp®  results  were  less  consistant  from  rater  to  rater  when  the  factor 
vor-ioi-no  wore  defined  In  terms  of  the  individual  sample  (N=20)  corroVition 
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matrices  for  each  rater  separately.  Finally,  the  ordering  of  Mahalanobis  - 
type  D8  Indices  was  almost  entirely  lacking  In  consistency  from  rater  to 
rater  when  the  Individual  sample  (N=2C)  covariance  matrix  was  used  in  com¬ 
puting  D8 .  In  fact.  Inter-rater  correlations  were  equal  to  or  less  than 
zero  In  four  out  of  the  seven  samples. 


Figure  1 


The  results  of  this  first  series  of  analyses  leads  to  the  conclusion 
that  the  simple  d8  Index  of  profile  similarity  Is  significantly  (7  out 
of  7)  more  invariant  across  raters  than  either  the  factor  space  Dp8  or 
the  Mahalanobls-type  D8  when  only  the  Information  contained  in  the  pro¬ 
files  being  clustered  Is  used.  The  results  further  suggest  that  the 
Mahalanobls-type  D8  is  entirely  lacking  in  reliability  across  raters  when 
the  small-sample  covariance  matrix  Is  employed  in  the  calculations.  As 
previously  discussed,  this  Is  due  to  the  fact  that  the  Mahalanobls-type 
D8  is  equivalent  to  factoring  the  correlation  (covariance)  matrix  completely 
and  then  equating  the  variance  of  all  factor  variates,  whether  true  common 
factors  or  error  factors. 

The  next  series  of  analyses  was  undertaken  to  evaluate  the  effect 
of  Increasing  the  stability  of  the  covariance  matrix  used  in  Mahalanobls- 
type  D8  calculations.  A  single  stable  covariance  matrix  based  on  a  larger 
sample  (N=280)  was  computed,  and  the  inverse  of  this  covariance  matrix 
was  used  In  calculating  lnter-proflle  distances  in  all  samples  for  both 
raters.  This  procedure  is  equivalent  to  transforming  all  rating  profiles 
using  a  common  transformation  matrix.  It  is  like  factoring  a  stable 
population  correlation  (covariance)  matrix  completely,  equating  variances 
of  all  factor  variates,  and  using  these  factor  variate  equations  In  trans¬ 
forming  all  ratings.  The  results  of  this  procedure  were  correlated  for 
the  two  Independent  raters  in  the  seven  samples.  Results  are  presented 
in  Figure  2.  Use  of  the  more  stable  common  covariance  matrix  to  obtain 
orthogonal  transformation,  rather  than  obtaining  a  separate  transforma¬ 
tion  matrix  for  each  rater,  resulted  In  increased  Inter-rater  reliability 
for  the  Mahalanobls-type  D8  indices;  however,  the  Inter-rater  reliability 
was  still  found  to  be  quite  low.  For  comparison,  results  obtained  for  the 
same  data  using  the  simple  d8  Index  are  reproduced  in  Figure  2.  Even 
where  a  single  orthogonal  transformations  for  the  Mahalanobls-type  recal¬ 
culations,  the  simple  d8  Index  evidences  considerably  more  stable  results 
from  one  rater  to  the  next.  Again,  this  result  is  presumable  due  to  the 
increased  emphasis  on  error  factors  resulting  from  a  transformation  which 
is  equivalent  to  total  factoring  of  a  matrix  containing  only  four  sub¬ 
stantial  principal  factors  and  12  roots  less  than  unity. 


Figure  2 


A  final  series  of  analyses  was  undertaken  to  evaluate  the  inter¬ 
rater  reliability  of  factor  space  D-,8  indices  computed  from  factor  variates 
derived  from  a  single  large  sample  '  (N-280)  correlation  matrix.  Factor 
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score  transformation  vectors  were  computed  for  one,  two  and  four  principal 
factors  of  the  common  large  sample  correlation  matrix.  Inter-rater  cor¬ 
relations  of  Dp?  values  were  computed  within  each  sample  of  20  cases  using 
the  same  factor  score  transformation  equations.  Results  are  presented  in 
Figure  3 •  For  comparison,  the  simple  d?  results  are  also  reproduced  in 
this  figure. 


Figure  3 


While  the  consistency  from  sample  to  sample  is  not  as  pronounced, 
the  general  trend  is  for  the  factor  sr  ce  Dp55  coefficients  to  evidence 
greater  invariance  between  raters  than  the  simple  d3  statistic.  Where  Dp  3 
based  on  the  four  principal  factors  corresponding  to  latent  roots  4 

greater  than  unity  were  analyzed,  the  inter-rater  consistency  was  higher 
than  for  the  simple  d?  statistic  in  six  out  of  the  seven  independent 
samples.  The  average  inter-rater  consistency  increased  when  Dp3  indices 
were  based  on  only  first  two  principal  factors,  and  increased  still  more 
with  use  of  only  first  prlncinal  factor;  however,  the  variability  from 
sample  to  sample  increased  as  fewer  factors  formed  the  baB’o  for  Dp3  cal¬ 
culations.  As  has  already  been  pointed  out,  the  Dp3  statistic  approaches 
the  Mahalanobls-type  D3  as  the  number  of  factors  approaches  the  total 
number  of  profile  components.  For  comparison  Dp  3  inter-rater  correla¬ 
tions  have  been  entered  in  Figure  3  also.  16 

From  these  results  it  is  concluded  that  the  use  of  a  stable  ortho¬ 
gonal  transformation  representing  only  she  non-error  factors  of  a  large- 
sample  correlation  matrix  will  tend  to  result  in  more  reliable  profile 
similarity  Indices,  that  there  is  generally  an  inverse  relationship  be¬ 
tween  number  of  factors  used  in  defining  the  space  and  the  reliability 
of  distance  indices,  but  that  the  simple  d?  statistic  compares  favorabl-’ 
to  the  best  profile  similarity  measures,  as  far  as  inter-rater  consistency 
is  concerned. 
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Table  1 

Seven  Indices  Employed  In  Comparing  Relative 
Invariance  of  Distances  Computed  from  Rating  Profiles  Provided 

by  Two  Independent  Observers 


Simple  d8 

Mahalanoble 

Da 

Factor 

Space  Dp 8 

Covariance  Matrix 

d8  =  4'  d 

Da  =  4'  C“* 

d 

n  p 

Up 

4'  W  W '  4 

based  on  N=20 

Covariance  Matrix 

Da  =  d  '  C"i 

4 

DF  a  = 

4#  W  W'  d 

based  on  N=280 

4 

Two  principal  factors 

n  s>  - 

df2  - 

4'  W  W'  4 

one  principal  factor 

Dp  8  = 

4'  w  w'  4 

Simple  d3 :  Covariance  matrix  not  Involved;  all  variables  enter 

Into  distance  calculations. 

Mahalanobls  Da :  First  series  of  analyses  involved  use  of  C_1  calcu¬ 

lated  from  the  particular  sample  of  20  cases  for 
whom  Interproflle  distances  were  calculated.  Second 
series  of  analyses  Involved  use  of  a  constant  C"1 
based  on  larger  sample  of  280  cases. 

Factor  Space  D^8 :  First  series  of  analyses  involved  first  four  prin¬ 

cipal  factors  of  sample  (N=20)  correlation  matrix. 
Second  series  of  analyses  involved  first  four  prin¬ 
cipal  factors  of  constant  correlation  matrix  based 
on  large  sample  of  280  cases.  Third  and  fourth 
series  Involved  two  and  one  principal  factors  of 
large  sample  matrix. 


•  Mahalanobis  type  D 

based  on  small  -  sample 
(  N  -  20 )  covariances. 
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Mahalanobis  type  D 
based  on  common  large 
sample  ( N  -  280  ) 
covariances. 
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Factor  Dp  based  on  four 
common  factors  of  large- 
sample  ( N  -  280 )  correlation 
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two  principal  factors  of 
large  -  sample  matrix. 
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sample  matrix. 
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o— — o  Factor  Dp  based  on  all  16 
princi palr  factors  of  large - 
sample  matrix. 
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